Unable to perform hybrid search

I am having trouble doing hybrid search. to give you guys bit of background I am using semantic search created inference with openai embeddings.

const addIndex = async () => {
  // try {
  //   await client.indices.delete({ index });
  // } catch (err) {
  //   //
  // }
  await client.indices.create({
    index,
    mappings: {
      properties: {
        phrase: {
          type: "semantic_text",
          inference_id: "openai_embeddings",
        },
        copilot: {
          type: "keyword",
        },
        org: {
          type: "keyword",
        },
        topic: {
          type: "keyword",
        },
        mode: {
          type: "keyword",
        },
      },
    },
  });
  console.log("created successfully");
};
const response = await client.transport.request({
    method: "PUT",
    path: "/_inference/text_embedding/openai_embeddings",
    body: {
      service: "openai",
      service_settings: {
        api_key: "xxxxxxxx-xxxxxxxx",
        model_id: "text-embedding-3-small",
      },
    },
  });

this is my search query

const response = await client.search({
    index,
    min_score: 0.8,
    query: {
      bool: {
        must: {
          semantic: {
            field: "phrase",
            query: "can i talk to a real person",
          },
        },
        filter: [
          { term: { copilot: "67108e2e8c0611df3ad71109" } }, // Filter by copilotId
          { term: { mode: "DRAFT" } }, // Filter by mode
        ],
      },
    },
  });

above query randomly works sometimes and doesnt work other times.
i make page to 1000 then it started giving results. why is pagesize altering the behaviour? Is it performing post filter? so that means it doesnt work if i have 2000 phrases? Please help me here.

Hi @harish32, Welcome to the Elastic community.

The page should not impact to the response.

  1. You defining 1000 in size?
  2. Any error you getting when there is no response?
  3. I am also assuming Async is well handled in your script.

no its returning empty array
here is the query i have ran

  const response = await client.search({
    index,
    query: {
      bool: {
        must: [
          {
            semantic: {
              field: "phrase",
              query: "talk to a agent",
            },
          },
        ],
        filter: [
          { term: { copilot: "67108e2e8c0611df3ad71109" } }, // Filter by copilotId
          { term: { mode: "DRAFT" } }, // Filter by mode
          { term: { topic: "675a88feb73a5d37e21fc352" } },
        ],
      },
    },
  });
  console.log(response.hits);

i have attached the screenshot where i ran same function 3 times you can see it gave result not completely but returned only one then
3rd time it didnt give anything

I would

  1. check the Elasticsearch logs (especially for any errors around inference or search)
  2. log the actual query run to compare it to the successful ones.

If I had to guess, I'd say turning the semantic query into its dense vector representation fails, so you're not running the same Elasticsearch query in the background. Maybe something like a rate limit is kicking in (but that's just a random guess). I'd be surprised if you ran the exact same Elasticsearch query and it would give you different results (assuming the cluster is healthy).

1 Like

Another helpful troubleshooting tool could be to run the knn search directly on the semantic_text field a a nested query. See example.

I tried knn as well still not getting results

const response = await client.search({
    index,
    query: {
      bool: {
        must: [
          {
            nested: {
              path: "phrase.inference.chunks",
              query: {
                knn: {
                  field: "phrase.inference.chunks.embeddings",
                  query_vector_builder: {
                    text_embedding: {
                      model_id: "openai_embeddings",
                      model_text: "talk to an agent",
                    },
                  },
                },
              },
            },
          },
        ],
        filter: [
          { term: { copilot: "67108e2e8c0611df3ad71109" } }, // Filter by copilotId
          { term: { mode: "DRAFT" } }, // Filter by mode
          { term: { topic: "675a88feb73a5d37e21fc352" } },
        ],
      },
    },

If i remove filter it works fine but i need both to work

Thanks for the reply i am new to Elasticsearch. i tried checking logs i cant find anything.


what i believe is happening it is doing a filter first and then searching on the 10 results it filtered. That is why when i increase the page size it is going to return 100 which will contain the semantic word so it is returning matching 10.

I have followed this

Do you have any reference for hybrid search like mine which would be very helpful.

Can you try adding the filters in the knn filter section, to pre filter the results returned by knn instead of post filtering?

Thank you so much @Kathleen_DeRusso :pray:. It is working now.
Thanks @xeraa for jumping in here

1 Like