What are the most popular contextual terms (after/before) of an expression?

softwaredoug · September 21, 2015, 2:05am

Well one simple way, depending on the size of your data, is to create an index of bigrams by using a custom analyzer.

So for the input to analysis, you'd have

the great house at

and instead of breaking it up into words modify analysis to break it up into bigrams (two word tokens) using the shingle filter, like

[the great] [great house] [house at]

A prefix query on house\ * here yields all the occurrences of house SPACE some word, then simply do a terms aggregration, and you'll see an ordering of all the bigrams as a facet, ordered by how frequently the terms occur in the search results. You may need to further filter this so you don't see every bigram in these documents.

"buckets" : [ 
                {
                    "key" : "house rules",
                    "doc_count" : 52
                },
                {
                    "key" : "house sucks",
                    "doc_count" : 42
                },
               ...
            ]
        }

The OTHER direction though is a bit trickier. You may need to duplicate your data to another field to get a different view. You can to wildcard * house queries, but they don't perform that well. Instead, you need to reverse the tokens BEFORE you do the prefix query. So in a completely separate field, you want to add a reverse filter to reverse the text AFTER shingling.

So:

[good house]

becomes for examining the other direction:

[esuoh doog]

Then repeat the process for the other direction with a esuoh\ * query getting terms aggregations that you'll have to reverse yourself

Fun problem, Hope that helps

Topic		Replies	Views
Most common adjacent words Elasticsearch	3	919	July 6, 2017
Significant Terms query with conditions in filter context Elasticsearch	5	434	July 31, 2019
Most popular terms searched Elasticsearch	8	9225	July 6, 2017
How to use Elasticsearch to find Collocations and Statistically Improbable Phrases Elasticsearch	2	1101	July 6, 2017
Implement collocation in ElasticSearch Elasticsearch	2	460	December 7, 2020

What are the most popular contextual terms (after/before) of an expression?

Related topics