Gopher hunting using hybrid search with Elasticsearch and Go

BY CARLY RICHMOND, LAURENT SAINT-FéLIX

Like animals and programming languages, search has evolved through different practices, making it difficult to choose between them. In the final blog in this series, Carly Richmond and Laurent Saint-Félix combine keyword and vector searches to find gophers in Elasticsearch using the Go client.

Building software today is a commitment to lifelong learning. As you’ve seen from previous blogs in this series, Carly recently started using Go.

Search has evolved through different practices. Deciding between them for your own search use cases can be difficult. All code in this series is based on the keyword and vector search examples introduced in Part One. Continue reading Part 2 and Part 2 of Code to learn about all the code in this series. In Part 2 of this series, we’ll share an example of how to combine vector search and keyword search using Elasticsearch and the Elasticsearch Go client.

Prerequisites

Just like the first part of this series, this example requires the following prerequisites:

  • Install Go version 1.13 or higher
  • Create your own Go repository using the recommended structure and package management described in the Go documentation
  • Create your own Elasticsearch cluster populated with a set of rodent-based pages, including our friendly Gopher from Wikipedia:

Connect to Elasticsearch

As a reminder, in our examples we will be using the Typed API provided by the Go client. Establishing a secure connection for any query requires configuring the client with one of the following:

  • Cloud ID and API key (if using Elastic Cloud)
  • Cluster URL, username, password, and certificate

Connecting to a cluster located on Elastic Cloud looks like this:

func GetElasticsearchClient() (*elasticsearch.TypedClient, error) {
var cloudID = os.Getenv("ELASTIC_CLOUD_ID")
var apiKey = os.Getenv("ELASTIC_API_KEY")

var es, err = elasticsearch.NewTypedClient(elasticsearch.Config{
CloudID: cloudID,
APIKey: apiKey,
Logger: & amp;elastictransport.ColorLogger{os.Stdout, true, true},
})

if err != nil {
return nil, fmt.Errorf("unable to connect: %w", err)
}

return es, nil
}

The client connection can then be used for searching, as shown in subsequent sections.

If you are using a self-deployed Elasticsearch cluster, you can refer to the article “Elasticsearch: Implementing Elasticsearch search using Go language – 8.x”.

Manually configure boost parameters

When combining any set of search algorithms, the traditional approach is to manually configure constants to enhance each query type. Specifically, a factor is assigned to each query and the combined result set is compared to the expected set to determine the query’s recall. We then iterate through several sets of factors and select the one closest to our desired state.

For example, you can combine a single text search query with a boost factor of 0.8 with a knn query with a lower boost factor of 0.2 by specifying the Boost field in both query types, as shown in the following example:

func HybridSearchWithBoost(client *elasticsearch.TypedClient, term string) ([]Rodent, error) {
var knnBoost float32 = 0.2
var queryBoost float32 = 0.8

res, err := client.Search().
Index("vector-search-rodents").
Knn(types.KnnQuery{
Field: "text_embedding.predicted_value",
Boost: &knnBoost,
K: 10,
NumCandidates: 10,
QueryVectorBuilder: & amp;types.QueryVectorBuilder{
TextEmbedding: & amp;types.TextEmbedding{
ModelId: "sentence-transformers__msmarco-minilm-l-12-v3",
ModelText: term,
},
}}).
Query( &types.Query{
Match: map[string]types.MatchQuery{
"title": {
Query: term,
Boost: &queryBoost,
},
},
}).
Do(context.Background())

if err != nil {
return nil, err
}

return getRodents(res.Hits.Hits)
}

The factors specified in the Boost option for each query are added to the document score. By increasing the score of a matching query by a larger factor than a knn query, the results of a keyword query are weighted more heavily.

The challenge with manual boosting, especially if you’re not a search expert, is that you need to make adjustments to find out what factors lead to the desired set of results. It’s just a case of trying random values to see what gets you closer to the desired result set.

Reciprocal Rank Fusion – Reciprocal Rank Fusion

Reciprocal Rank Fusion (RRF) is released in Hybrid Search Technology Preview in Elasticsearch 8.9. Its purpose is to reduce the learning curve associated with tuning and reduce the time spent trying factors to optimize the result set.

?

  • D – Document Set
  • R – a set of rankings as permutations of 1..|D|
  • K – usually defaults to 60

Using RRF, document scores are recalculated by blending the scores with the following algorithm:

score := 0.0
// q is a query in the set of queries (vector and keyword search)
for _, q := range queries {
    // result(q) is the results
    if document in result(q) {
        // k is a ranking constant (default 60)
        // rank(result(q), d) is the document's rank within result(q)
        // range from 1 to the window_size (default 100)
        score + = 1.0 / (k + rank(result(q), d))
    }
}

return score

The advantage of using RRF is that we can take advantage of sensible defaults in Elasticsearch. The ranking constant k defaults to 60. To provide a trade-off between the relevance of returned documents and query performance when searching on large datasets, the size of the result set for each considered query is limited to the value of window_size, which defaults to 100 as described in the documentation.

k and windows_size can also be configured in the Rrf configuration in the Go client’s Rank method, as shown in the following example:

func HybridSearchWithRRF(client *elasticsearch.TypedClient, term string) ([]Rodent, error) {
// Minimum required window size for the default result size of 10
var windowSize int64 = 10
var rankConstant int64 = 42

res, err := client.Search().
Index("vector-search-rodents").
Knn(types.KnnQuery{
Field: "text_embedding.predicted_value",
K: 10,
NumCandidates: 10,
QueryVectorBuilder: & amp;types.QueryVectorBuilder{
TextEmbedding: & amp;types.TextEmbedding{
ModelId: "sentence-transformers__msmarco-minilm-l-12-v3",
ModelText: term,
},
}}).
Query( &types.Query{
Match: map[string]types.MatchQuery{
"title": {Query: term},
},
}).
Rank( &types.RankContainer{
Rrf: &types.RrfRank{
WindowSize: &windowSize,
RankConstant: &rankConstant,
},
}).
Do(context.Background())

if err != nil {
return nil, err
}

return getRodents(res.Hits.Hits)
}

Conclusion

Here, we discussed how to combine vector search and keyword search in Elasticsearch using the Elasticsearch Go client.

Check out the GitHub repository for all the code in this series. If you haven’t checked out all the code in this series, check out Part 1 and Part 2 .

Happy gopher hunting!

Original text: Using hybrid search for gopher hunting with Elasticsearch and Go – Elastic Search Labs

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeArtificial intelligenceMachine learning toolkit Scikit-learn388713 people are learning the system