Percolate query matches different query with synonym

I have created a percolate index with custom analyzer and with that custom analyzer I am passing synonyms.

Let's say,

Synonyms,
A => A, X
B => B, X
C => C, X

And my percolate queries also has all different query for each
query
q1 => "(A)"
q2 => "(B)"
q3 => "(C)"

But when I am searching for document using percolate index,

It is returning me all those query.

For example,

GET sample_index/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "content": "A"
      }
    }
  }
}

Result:

"hits": {
    "total": 5,
    "max_score": 0.46029136,
    "hits": [
      {
        "_index": "sample_index",
        "_type": "doc",
        "_id": "X-A",
        "_score": 0.46029136,
        "_source": {
          "query": {
            "query_string": {
              "query": "A"
            }
          }
        },
        "fields": {
          "_percolator_document_slot": [
            0
          ]
        }
      },
{
        "_index": "sample_index",
        "_type": "doc",
        "_id": "X-B",
        "_score": 0.36165747,
        "_source": {
          "query": {
            "query_string": {
              "query": "B"
            }
          }
        },
        "fields": {
          "_percolator_document_slot": [
            0
          ]
        }
      },
{
        "_index": "sample_index",
        "_type": "doc",
        "_id": "X-C",
        "_score": 0.36165747,
        "_source": {
          "query": {
            "query_string": {
              "query": "C"
            }
          }
        },
        "fields": {
          "_percolator_document_slot": [
            0
          ]
        }
      }
}

Hi @niketpatel2525, welcome to the Forum!

I took the liberty to re-format your code examples a bit to make them more readable. You can enclose json code snippets (and other code block) with three backticks (markdown style, ```) to make them indent nicer if you need to do so in the future.

Regarding your question, I suspect you defined the analyzer for your "content" field using the analyzer parameter on the field. This leads to the synonym filter being applied both to the documents at index time (so they all contain X) as well as the queries (also the percolate ones), which then will also search for X alongside the other token. In order to change this you have basically two options:

  • only use search-time synonym expansion by using a dedicated search_analyzer that contains the synonym filter, while the redular analyzer doesn't
  • do the reverse and only expand synonyms at index time

I would advise for search-time synomys since they are much more flexible to use, but there is a bit of discussion around the tradeoffs in our user guide. Hope this makes sense and gets you a step closer towards your goals.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.