Highlight stopwords in hits

fedoroko · August 9, 2019, 2:31pm

Hey, what is the best way to skip stopwords in query but highlight them in hits?

Current settings:

'settings': {
    'analysis': {
        'analyzer':{
            my_analyzer:{
                'tokenizer':'standard',
                'filter':{
                    'lowercase',
                    'my_snowball_en',
                    'english_stop',
                }
            }
        },
        'filter':{
            'my_snowball_en':{
                'type':'sbowball',
                'language':'english',
            },
            'english_stop':{
                'type':'stop',
                'stopwords':'english'
            }
        }
    }
},
'mappings':{
    'properties':{
        'title':{
            'type':'text',
            'fields':{
                'en':{
                    'type':'text',
                    'analyzer':'my_analyzer'
                }
            }
        }
    }
}

search:

'query':{
    'bool':{
        'must':{
            'multi_match':{
                'type':'bool_prefix',
                'query':'some_query',
                'fuzziness':'1',
                'prefix_lenght':'2,
                'fields':{
                    'title',
                    'others'
                },
                'minimum_should_match':'80%',
            },
            'match':{
                'status':'3',
            },
        }
    }
},
'highlight':{
    'pre,post,nub_of_frag,etc...'
    'fields':{
        'title',
        'others'
    }
}

with query: 'the cheese for the red fox ', it highlights cheese red fox.

Tried to build highlight_query but it returns nothing:

'highlight':{
    'highlight_query':{
        'bool':{
            'should':{
                'match_phrase':{
                    'title':'similar to search query'
                }
            }
        }
    }
}

What can i do in this case?

spinscale · August 12, 2019, 8:42am

Hey,

have you tried using a different field for the highlight query. I have an example here (though I might be missing something)

DELETE test

PUT test
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "fields": {
          "stop": {
            "type": "text",
            "analyzer": "stop"
          }
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "text" : "this is an awesome test and some more"
}

GET test/_analyze
{
  "text" : "this is an awesome test and some more"
}

GET test/_analyze
{
  "text" : "this is an awesome test and some more",
  "field": "text.stop"
}

GET test/_search
{
  "query": {
    "match": {
      "text.stop": {
        "query": "this is an awesome test",
        "operator": "and"
      }
    }
  },
  "highlight": {
    "fields": {
      "text": {
        "highlight_query": {
          "match": {
            "text": "an awesome test"
          }
        }
      }
    }
  }
}

--Alex

fedoroko · August 13, 2019, 9:00am

Yes, it doesn't work for me. And anyway i guess it's a bad option to highlight by original query string because it can have mistakes, so we will get hits thanks to fuzzy but highlights will be 0. Am i right?

Is there any other solution?

Thank you for answer anyway.

fedoroko · August 14, 2019, 10:48am

Figured out how to use highlight_query it works but because of fuzziness 'AUTO' words with 2 or less letters don't highlight. I can't delete fuzziness cause query may contain mistakes. What options do i have?

Current query:

'query':{
    'bool':{
        'must':{
            'multi_match':{
                'type':'best_fields',
                'query':'some_query',
                'fuzziness':'AUTO',
                'fields':{
                    'title.en',
                    'others'
                },
                'minimum_should_match':'80%',
            },
            'match':{
                'status':'3',
            },
        }
    }
},
'highlight':{
    'pre,post,num_of_frag,etc...'
    'fields':{
        'title': {
            'highlight_query':{
                'match':{
                    'title':{
                        'query': 'similar to original query',
                        'fuzziness': 'AUTO',
                        //standard analyzer without stopwords
                        'analyzer': 'standard'
                    }
                }
            }
        },
        'others'
    }
}

with query "effect of a fox" it highlights "effect of a fox".

Also tried to use shingle [min2; max2] with my_analyzer from original post. In this case i can just highlight by 'title.en' field (without highlight_query), but the result is unpredictable.

For example with query "in connection with the development" it highlights "The article deals with the problems arising in connection with the development of curricula for basic educational programs based...".

In this text _analyze shows next tokens: [_ connect][connect][connect _][_develop][develop][develop _] so in my opinion it should returns: "_(in) connection _(with) _(the) development _(of)" but in highlights return "in" was cut out. Can someone explain why it works like this?

system · September 11, 2019, 10:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Highlighter - don't highlight stop words Elasticsearch	2	1855	July 5, 2017
Highlight analyzed field with stop words Elasticsearch	1	472	July 6, 2017
Does english analyzer prevent fields from highlighting? Elasticsearch	2	483	July 6, 2017
Stop words and query_string Elasticsearch	4	702	July 6, 2017
Highlighting in a a search query Elastic Search	6	344	July 8, 2024

Highlight stopwords in hits

Related topics