Skip to content

Instructional Guide for Hybrid Search Configuration

This guide will instruct you on how to configure the conf/hybrid_search.yaml file, which is for setting embedding information for each index used in Hybrid Search. The guide is divided into three sections: Getting Started, Parameter Definitions, and Examples.

Getting Started

Follow these steps to configure the settings for your YAML file:

Step 1: Define Hybrid Queries

Add your embedding information under the hybrid_queries section. Replace the placeholders with your actual values.

hybrid_queries:
  # Example configuration
  - index: your_index_name  # Replace 'your_index_name' with the actual name of your index
    embedding_mappings:
      - embedding_field: your_embedding_field  # Replace 'your_embedding_field' with the field name for the embeddings
        model_name: your_model_name  # Replace 'your_model_name' with the name of the embedding model to use
        src_fields:
          - source_field_1  # Replace 'source_field_1' with the source field from your data
          - source_field_2  # Add more source fields as needed
          ...
      ...

The hybrid_queries is a list of dictionaries that configure the embedding information for each index. Each dictionary has two keys:

  • index: The name of the index that will include embeddings used by Hybrid Search.
  • embedding_mappings: A list of dictionaries, each containing three keys:
    • embedding_field: The name of the field in your index where the generated embeddings will be stored.
    • model_name: The name of the embedding model used to generate the embeddings.
    • src_fields: A list of source fields from your data that will be used to generate the embeddings. These fields typically contain text data that the embedding models will process.

Step 2: Repeat for Multiple Indexes

If you have multiple indexes, repeat the configuration for each one.

hybrid_queries:
  - index: first_index_name
    embedding_mappings:
      - embedding_field: first_embedding_field
        model_name: first_model_name
        src_fields:
          - first_source_field_1
          - first_source_field_2
  - index: second_index_name
    embedding_mappings:
      - embedding_field: second_embedding_field
        model_name: second_model_name
        src_fields:
          - second_source_field_1

Step 3: Save Configuration

Save your YAML file after filling in the necessary information.

Parameter Definitions

This section provides a detailed explanation of each parameter in the YAML file.

Parameter Data Type Required Description
index String Optional The name of the index that will include embeddings used by Hybrid Search.
embedding_mappings Array of Dictionaries Optional The list of dictionaries, each includes three keys.
- embedding_field is the name of the generated embedding.
- model_name is the embedding model used to generate the embedding.
- src_fields is a list of source fields that will be used to generate the embedding.
embedding_mappings[].embedding_field String Optional The name of the field in your index where the generated embedding will be stored. Duplicate names within the same index are not allowed.
embedding_mappings[].model_name String Optional The name of the embedding model used to generate the embedding. Valid values are bge_large_en_v1_5, splade_v3, cohere_embed_english_v3 and cohere_embed_multilingual_v3.
embedding_mappings[].src_fields Array of string Optional A list of source fields from your data that will be used to generate the embedding. These fields typically contain text data that the embedding models will process.

Examples

Here are some examples to help you understand how to configure the YAML file.

Example 1: Configuring for a Book Index

hybrid_queries:
  - index: book_semantic_index
    embedding_mappings:
      - embedding_field: book_title_description_embed
        model_name: cohere_embed_english_v3
        src_fields:
          - book_title
          - book_description

The above configuration is to generate the book_title_description_embed embedding to represent the semantics of both book_title and book_description, using the cohere_embed_english_v3 model in the book_semantic_index index.

Example 2: Configuring for a Movie Index

hybrid_queries:
  - index: movie_semantic_index
    embedding_mappings:
      - embedding_field: movie_title_embed
        model_name: cohere_embed_multilingual_v3
        src_fields:
          - movie_title
      - embedding_field: movie_description_embed
        model_name: cohere_embed_english_v3
        src_fields:
          - movie_description

The above configuration is used to generate two embeddings for representing semantics in the movie_semantic_index index:

  • movie_title_embed embedding to represent the movie_title semantic, using the cohere_embed_multilingual_v3 model.
  • movie_description_embed embedding to represent the movie_description semantic, using the cohere_embed_english_v3 model.

Example 3: Configuring for both a Book Index and a Movie Index

hybrid_queries:
  - index: book_semantic_index
    embedding_mappings:
      - embedding_field: book_title_embed
        model_name: bge_large_en_v1_5
        src_fields:
          - book_title
  - index: movie_semantic_index
    embedding_mappings:
      - embedding_field: movie_title_embed
        model_name: splade_v3
        src_fields:
          - movie_title

The above configuration is used to generate embeddings in different indexes:

  • book_title_embed embedding to represent the book_title semantic, using the bge_large_en_v1_5 model in the book_semantic_index index.
  • movie_title_embed embedding to represent the movie_title semantic, using the splade_v3 model in the movie_semantic_index index.