Instructional Guide for Hybrid Search Configuration¶

This guide will instruct you on how to configure the conf/hybrid_search.yaml file, which is for setting embedding information for each index used in Hybrid Search. The guide is divided into three sections: Getting Started, Parameter Definitions, and Examples.

Getting Started¶

Follow these steps to configure the settings for your YAML file:

Step 1: Define Hybrid Queries¶

Add your embedding information under the hybrid_queries section. Replace the placeholders with your actual values.

hybrid_queries:
  # Example configuration
  - index: your_index_name  # Replace 'your_index_name' with the actual name of your index
    embedding_mappings:
      - embedding_field: your_embedding_field  # Replace 'your_embedding_field' with the field name for the embeddings
        model_name: your_model_name  # Replace 'your_model_name' with the name of the embedding model to use
        src_fields:
          - source_field_1  # Replace 'source_field_1' with the source field from your data
          - source_field_2  # Add more source fields as needed
          ...
      ...

The hybrid_queries is a list of dictionaries that configure the embedding information for each index. Each dictionary has two keys:

index: The name of the index that will include embeddings used by Hybrid Search.
embedding_mappings: A list of dictionaries, each containing three keys:
- embedding_field: The name of the field in your index where the generated embeddings will be stored.
- model_name: The name of the embedding model used to generate the embeddings.
- src_fields: A list of source fields from your data that will be used to generate the embeddings. These fields typically contain text data that the embedding models will process.

Step 2: Repeat for Multiple Indexes¶

If you have multiple indexes, repeat the configuration for each one.

hybrid_queries:
  - index: first_index_name
    embedding_mappings:
      - embedding_field: first_embedding_field
        model_name: first_model_name
        src_fields:
          - first_source_field_1
          - first_source_field_2
  - index: second_index_name
    embedding_mappings:
      - embedding_field: second_embedding_field
        model_name: second_model_name
        src_fields:
          - second_source_field_1

Step 3: Save Configuration¶

Save your YAML file after filling in the necessary information.

Parameter Definitions¶

This section provides a detailed explanation of each parameter in the YAML file.

Parameter	Data Type	Required	Description
`index`	String	Optional	The name of the index that will include embeddings used by Hybrid Search.
`embedding_mappings`	Array of Dictionaries	Optional	The list of dictionaries, each includes three keys. - `embedding_field` is the name of the generated embedding. - `model_name` is the embedding model used to generate the embedding. - `src_fields` is a list of source fields that will be used to generate the embedding.
`embedding_mappings[].embedding_field`	String	Optional	The name of the field in your index where the generated embedding will be stored. Duplicate names within the same index are not allowed.
`embedding_mappings[].model_name`	String	Optional	The name of the embedding model used to generate the embedding. Valid values are `bge_large_en_v1_5`, `splade_v3`, `cohere_embed_english_v3` and `cohere_embed_multilingual_v3`.
`embedding_mappings[].src_fields`	Array of string	Optional	A list of source fields from your data that will be used to generate the embedding. These fields typically contain text data that the embedding models will process.

Examples¶

Here are some examples to help you understand how to configure the YAML file.

Example 1: Configuring for a Book Index¶

hybrid_queries:
  - index: book_semantic_index
    embedding_mappings:
      - embedding_field: book_title_description_embed
        model_name: cohere_embed_english_v3
        src_fields:
          - book_title
          - book_description

The above configuration is to generate the book_title_description_embed embedding to represent the semantics of both book_title and book_description, using the cohere_embed_english_v3 model in the book_semantic_index index.

Example 2: Configuring for a Movie Index¶

hybrid_queries:
  - index: movie_semantic_index
    embedding_mappings:
      - embedding_field: movie_title_embed
        model_name: cohere_embed_multilingual_v3
        src_fields:
          - movie_title
      - embedding_field: movie_description_embed
        model_name: cohere_embed_english_v3
        src_fields:
          - movie_description

The above configuration is used to generate two embeddings for representing semantics in the movie_semantic_index index:

movie_title_embed embedding to represent the movie_title semantic, using the cohere_embed_multilingual_v3 model.
movie_description_embed embedding to represent the movie_description semantic, using the cohere_embed_english_v3 model.

Example 3: Configuring for both a Book Index and a Movie Index¶

hybrid_queries:
  - index: book_semantic_index
    embedding_mappings:
      - embedding_field: book_title_embed
        model_name: bge_large_en_v1_5
        src_fields:
          - book_title
  - index: movie_semantic_index
    embedding_mappings:
      - embedding_field: movie_title_embed
        model_name: splade_v3
        src_fields:
          - movie_title

The above configuration is used to generate embeddings in different indexes:

book_title_embed embedding to represent the book_title semantic, using the bge_large_en_v1_5 model in the book_semantic_index index.
movie_title_embed embedding to represent the movie_title semantic, using the splade_v3 model in the movie_semantic_index index.