Getting a result that I don't want. How to check why

Peter_Steenbergen · March 28, 2018, 10:00am

Hi,

I am using ElasticSearch 6.1.3 and have the following situation.
Let me start sharing the query I am using.

POST develop/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "omega 7",
            "fields": ["merk.naam^2", "categorie.naam^14", "naam^70", "did_you_mean"],
            "operator": "and"
          }
        }
      ]
    }
  },
  "track_scores": true,
  "sort": [
    {
      "type.keyword": {
        "order": "asc"
      }
    },
    {
      "populariteitscijfer": {
        "order": "desc"
      }
    }
  ]
}

With this query I get the results I want based on the scores and all (omega 3 is not appearing).
Seen in: http://drops.3ws.nl/qstszh

But when I change "omega 7" to "omega 3" the omega 7 category is also showing.
Seen in: http://drops.3ws.nl/NWHLl5

I cannot figure out why this is happening. Since the reversal does not include that category.
When I run the analyzer like this:

GET develop/_analyze
{
  "analyzer": "didYouMean",
  "text": ["omega 3"]
}

This is the result for it:

{
  "tokens": [
    {
      "token": "omega",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "visolie",
      "start_offset": 0,
      "end_offset": 7,
      "type": "SYNONYM",
      "position": 0,
      "positionLength": 2
    },
    {
      "token": "3",
      "start_offset": 6,
      "end_offset": 7,
      "type": "<NUM>",
      "position": 1
    }
  ]
}

I am itching my head over this. Hope someone can help out.

Greetings,
Peter

Edit::

This is the contents of the field: did_you_mean with the query:

{
    "_index": "development-2018032812",
    "_type": "doc",
    "_id": "categorie-485",
    "_score": 4.3321357,
    "fields": {
      "did_you_mean": [
        "Omega-7"
      ]
    },
    "sort": [
      "categorie",
      -9223372036854776000
    ]
  },
  {
    "_index": "development-2018032812",
    "_type": "doc",
    "_id": "categorie-1",
    "_score": 632.3207,
    "fields": {
      "did_you_mean": [
        "Omega-3"
      ]
    },
    "sort": [
      "categorie",
      -9223372036854776000
    ]
  }

abdon · March 29, 2018, 8:00am

You can add "explain" : true to a query, to see how the score for each hit was calculated. That should help you figure out why a certain document is a hit.

If that doesn't help, could you post the full documents with the IDs categorie-485 and categorie-1, as well as your index settings and mappings (the output of GET develop), and I'll gladly take a look.

Peter_Steenbergen · March 29, 2018, 8:54am

Hello Abdon,

Thank you for helping me out.
It was to many characters so i created a gist for it:

gist.github.com

https://gist.github.com/petericebear/0dd705dd3e9d0e6124b9150972dd75d3

gistfile1.txt

This is the output of GET develop

    {
      "develop-2018032910": {
        "aliases": {
          "develop": {}
        },
        "mappings": {
          "doc": {
            "properties": {

This file has been truncated. show original

Thanks in advance, greetings

abdon · March 29, 2018, 11:30am

Thanks for posting the additional information.

You're getting back document categorie-485 as a hit for the query omega 3 because of the synonyms you have set up, specifically these synonym definitions: visolie, omega 3 and visolie, omega-3. These synonyms are applied at index time as well as at query time.

At index time, the did_you_mean field of document categorie-485 will contain the value of for example naam (through copy_to): The naam field contains the value Omega-7, which gets tokenized into:

GET develop/_analyze
{
  "analyzer": "didYouMean",
  "text": ["Omega-7"]
}

{
  "tokens": [
    {
      "token": "omega",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "visolie",
      "start_offset": 0,
      "end_offset": 7,
      "type": "SYNONYM",
      "position": 0,
      "positionLength": 2
    },
    {
      "token": "7",
      "start_offset": 6,
      "end_offset": 7,
      "type": "<NUM>",
      "position": 1
    }
  ]
}

All of these terms will end up in the inverted index, including visolie (as a synonym of omega).

At query time, the omega 3 query on the did_you_mean field will also query for the same visolie synonym, as you can see from the output of _analyze that you posted. That's why this document is a match: the query for the synonym visolie matches the document because of the term visolie (through synonyms).

Now, you should ask yourself if you really want to apply synonyms both at index time and at query time. Generally, that is not the case. It's double the work and it can lead to unexpected search results as you have experienced.

I'd go with query-time synonyms only. This is something you can achieve by setting up a search_analyzer that uses synonyms, and an analyzer that does not use synonyms. For the did_you _mean_field the mapping would become:

        "did_you_mean": {
          "type": "text",
          "analyzer": "standard", 
          "search_analyzer": "didYouMean"
        }

With that mapping, document categorie-485 is no longer a hit for the query omega 3. But synonyms still work. A query for visolie will still return document categorie-1.

Synonyms are tricky to set up properly. If you want to read more I can really recommend the excellent book "Relevant Search" by @softwaredoug that covers synonyms in great depth.

Peter_Steenbergen · March 29, 2018, 11:45am

Thank you for the helpful reply. I thought the AND operator was also triggered for the query in which 3 is not 7. Is there a reason that is not triggered for this kind of query?

Just bought the book and will read up on it. The solution you gave helped me out.

abdon · March 29, 2018, 11:48am

Yes, there is an AND for omega 3, but it would not make sense to search for all synonyms with an AND. omega AND 3 AND visolie would probably not return any hits. Instead Elasticsearch will search for one of the synonyms. You can think of the query as: (omega AND 3) OR visolie.

Peter_Steenbergen · March 29, 2018, 12:01pm

Ah, and in this case it was visolie also for Omega 7.
Thank you for the clarification.

system · April 26, 2018, 12:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Synonyms in a query Elasticsearch	7	1398	July 6, 2017
Query with synonym doesn't work as expected Elasticsearch	6	2571	July 5, 2017
Help with Synonyms Elasticsearch	6	513	July 6, 2017
Elasticsearch synonyms and boost by category Elasticsearch	6	1636	December 29, 2016
Synonym behavior Elasticsearch	3	486	July 6, 2017

Getting a result that I don't want. How to check why

Related topics