Instructional Guide for Hybrid Search Configuration¶
This guide will instruct you on how to configure the conf/hybrid_search.yaml
file, which is for setting embedding information for each index used in Hybrid Search. The guide is divided into three sections: Getting Started, Parameter Definitions, and Examples.
Getting Started¶
Follow these steps to configure the settings for your YAML file:
Step 1: Define Hybrid Queries¶
Add your embedding information under the hybrid_queries
section. Replace the placeholders with your actual values.
hybrid_queries:
# Example configuration
- index: your_index_name # Replace 'your_index_name' with the actual name of your index
embedding_mappings:
- embedding_field: your_embedding_field # Replace 'your_embedding_field' with the field name for the embeddings
model_name: your_model_name # Replace 'your_model_name' with the name of the embedding model to use
src_fields:
- source_field_1 # Replace 'source_field_1' with the source field from your data
- source_field_2 # Add more source fields as needed
...
...
The hybrid_queries
is a list of dictionaries that configure the embedding information for each index. Each dictionary has two keys:
index
: The name of the index that will include embeddings used by Hybrid Search.embedding_mappings
: A list of dictionaries, each containing three keys:embedding_field
: The name of the field in your index where the generated embeddings will be stored.model_name
: The name of the embedding model used to generate the embeddings.src_fields
: A list of source fields from your data that will be used to generate the embeddings. These fields typically contain text data that the embedding models will process.
Step 2: Repeat for Multiple Indexes¶
If you have multiple indexes, repeat the configuration for each one.
hybrid_queries:
- index: first_index_name
embedding_mappings:
- embedding_field: first_embedding_field
model_name: first_model_name
src_fields:
- first_source_field_1
- first_source_field_2
- index: second_index_name
embedding_mappings:
- embedding_field: second_embedding_field
model_name: second_model_name
src_fields:
- second_source_field_1
Step 3: Save Configuration¶
Save your YAML file after filling in the necessary information.
Parameter Definitions¶
This section provides a detailed explanation of each parameter in the YAML file.
Parameter | Data Type | Required | Description |
---|---|---|---|
index |
String | Optional | The name of the index that will include embeddings used by Hybrid Search. |
embedding_mappings |
Array of Dictionaries | Optional | The list of dictionaries, each includes three keys. - embedding_field is the name of the generated embedding. - model_name is the embedding model used to generate the embedding.- src_fields is a list of source fields that will be used to generate the embedding. |
embedding_mappings[].embedding_field |
String | Optional | The name of the field in your index where the generated embedding will be stored. Duplicate names within the same index are not allowed. |
embedding_mappings[].model_name |
String | Optional | The name of the embedding model used to generate the embedding. Valid values are bge_large_en_v1_5 , splade_v3 , cohere_embed_english_v3 and cohere_embed_multilingual_v3 . |
embedding_mappings[].src_fields |
Array of string | Optional | A list of source fields from your data that will be used to generate the embedding. These fields typically contain text data that the embedding models will process. |
Examples¶
Here are some examples to help you understand how to configure the YAML file.
Example 1: Configuring for a Book Index¶
hybrid_queries:
- index: book_semantic_index
embedding_mappings:
- embedding_field: book_title_description_embed
model_name: cohere_embed_english_v3
src_fields:
- book_title
- book_description
The above configuration is to generate the book_title_description_embed
embedding to represent the semantics of both book_title
and book_description
, using the cohere_embed_english_v3
model in the book_semantic_index
index.
Example 2: Configuring for a Movie Index¶
hybrid_queries:
- index: movie_semantic_index
embedding_mappings:
- embedding_field: movie_title_embed
model_name: cohere_embed_multilingual_v3
src_fields:
- movie_title
- embedding_field: movie_description_embed
model_name: cohere_embed_english_v3
src_fields:
- movie_description
The above configuration is used to generate two embeddings for representing semantics in the movie_semantic_index
index:
movie_title_embed
embedding to represent themovie_title
semantic, using thecohere_embed_multilingual_v3
model.movie_description_embed
embedding to represent themovie_description
semantic, using thecohere_embed_english_v3
model.
Example 3: Configuring for both a Book Index and a Movie Index¶
hybrid_queries:
- index: book_semantic_index
embedding_mappings:
- embedding_field: book_title_embed
model_name: bge_large_en_v1_5
src_fields:
- book_title
- index: movie_semantic_index
embedding_mappings:
- embedding_field: movie_title_embed
model_name: splade_v3
src_fields:
- movie_title
The above configuration is used to generate embeddings in different indexes:
book_title_embed
embedding to represent thebook_title
semantic, using thebge_large_en_v1_5
model in thebook_semantic_index
index.movie_title_embed
embedding to represent themovie_title
semantic, using thesplade_v3
model in themovie_semantic_index
index.