Count of phrase matches per document

Hi,

I'm using this query to search a field for occurrences of phrases.

"query": {
    "match_phrase": {
       "content": "my test phrase"
  }
 }

I need to calculate how many matches occurred for each phrase per document (if this is even possible?)

I've considered aggregators but think these don't meet the requirements as these will give me the number of matches over the whole index not per document. `enter code here

Thanks.

I don't think there's a straightforward way to get those phrase counts from Elasticsearch.

If it's just the occasional query for which you would want to know this, you could add "explain": true to your search request (on the same level as your "query"). The response will contain a phraseFreq, which is the phrase frequency, i.e. the number of times the phrase occurs in the content field of this document.

Keep in mind though that adding "explain" to your search request will slow down your queries significantly, so it's definitely not something you want to do for all search requests. But for the occasional request, it's perfectly fine.

If you need to know the phrase frequency for a lot of requests, you may want to look into custom scoring as documented here: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/modules-scripting-engine.html . You could calculate a score that is purely based on the phrase frequency. This is however quite advanced.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.