Highlight stopwords in hits

Hey, what is the best way to skip stopwords in query but highlight them in hits?

Current settings:

'settings': {
    'analysis': {
        'analyzer':{
            my_analyzer:{
                'tokenizer':'standard',
                'filter':{
                    'lowercase',
                    'my_snowball_en',
                    'english_stop',
                }
            }
        },
        'filter':{
            'my_snowball_en':{
                'type':'sbowball',
                'language':'english',
            },
            'english_stop':{
                'type':'stop',
                'stopwords':'english'
            }
        }
    }
},
'mappings':{
    'properties':{
        'title':{
            'type':'text',
            'fields':{
                'en':{
                    'type':'text',
                    'analyzer':'my_analyzer'
                }
            }
        }
    }
}

search:

'query':{
    'bool':{
        'must':{
            'multi_match':{
                'type':'bool_prefix',
                'query':'some_query',
                'fuzziness':'1',
                'prefix_lenght':'2,
                'fields':{
                    'title',
                    'others'
                },
                'minimum_should_match':'80%',
            },
            'match':{
                'status':'3',
            },
        }
    }
},
'highlight':{
    'pre,post,nub_of_frag,etc...'
    'fields':{
        'title',
        'others'
    }
}

with query: 'the cheese for the red fox ', it highlights cheese red fox.

Tried to build highlight_query but it returns nothing:

'highlight':{
    'highlight_query':{
        'bool':{
            'should':{
                'match_phrase':{
                    'title':'similar to search query'
                }
            }
        }
    }
}

What can i do in this case?

Hey,

have you tried using a different field for the highlight query. I have an example here (though I might be missing something)

DELETE test

PUT test
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "fields": {
          "stop": {
            "type": "text",
            "analyzer": "stop"
          }
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "text" : "this is an awesome test and some more"
}

GET test/_analyze
{
  "text" : "this is an awesome test and some more"
}

GET test/_analyze
{
  "text" : "this is an awesome test and some more",
  "field": "text.stop"
}

GET test/_search
{
  "query": {
    "match": {
      "text.stop": {
        "query": "this is an awesome test",
        "operator": "and"
      }
    }
  },
  "highlight": {
    "fields": {
      "text": {
        "highlight_query": {
          "match": {
            "text": "an awesome test"
          }
        }
      }
    }
  }
}

--Alex

Yes, it doesn't work for me. And anyway i guess it's a bad option to highlight by original query string because it can have mistakes, so we will get hits thanks to fuzzy but highlights will be 0. Am i right?

Is there any other solution?

Thank you for answer anyway.

Figured out how to use highlight_query it works but because of fuzziness 'AUTO' words with 2 or less letters don't highlight. I can't delete fuzziness cause query may contain mistakes. What options do i have?

Current query:

'query':{
    'bool':{
        'must':{
            'multi_match':{
                'type':'best_fields',
                'query':'some_query',
                'fuzziness':'AUTO',
                'fields':{
                    'title.en',
                    'others'
                },
                'minimum_should_match':'80%',
            },
            'match':{
                'status':'3',
            },
        }
    }
},
'highlight':{
    'pre,post,num_of_frag,etc...'
    'fields':{
        'title': {
            'highlight_query':{
                'match':{
                    'title':{
                        'query': 'similar to original query',
                        'fuzziness': 'AUTO',
                        //standard analyzer without stopwords
                        'analyzer': 'standard'
                    }
                }
            }
        },
        'others'
    }
}

with query "effect of a fox" it highlights "effect of a fox".

Also tried to use shingle [min2; max2] with my_analyzer from original post. In this case i can just highlight by 'title.en' field (without highlight_query), but the result is unpredictable.

For example with query "in connection with the development" it highlights "The article deals with the problems arising in connection with the development of curricula for basic educational programs based...".

In this text _analyze shows next tokens: [_ connect][connect][connect _][_develop][develop][develop _] so in my opinion it should returns: "_(in) connection _(with) _(the) development _(of)" but in highlights return "in" was cut out. Can someone explain why it works like this?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.