What are the most popular contextual terms (after/before) of an expression?

Well one simple way, depending on the size of your data, is to create an index of bigrams by using a custom analyzer.

So for the input to analysis, you'd have

the great house at

and instead of breaking it up into words modify analysis to break it up into bigrams (two word tokens) using the shingle filter, like

[the great] [great house] [house at]

A prefix query on house\ * here yields all the occurrences of house SPACE some word, then simply do a terms aggregration, and you'll see an ordering of all the bigrams as a facet, ordered by how frequently the terms occur in the search results. You may need to further filter this so you don't see every bigram in these documents.

"buckets" : [ 
                {
                    "key" : "house rules",
                    "doc_count" : 52
                },
                {
                    "key" : "house sucks",
                    "doc_count" : 42
                },
               ...
            ]
        }

The OTHER direction though is a bit trickier. You may need to duplicate your data to another field to get a different view. You can to wildcard * house queries, but they don't perform that well. Instead, you need to reverse the tokens BEFORE you do the prefix query. So in a completely separate field, you want to add a reverse filter to reverse the text AFTER shingling.

So:

[good house]

becomes for examining the other direction:

[esuoh doog]

Then repeat the process for the other direction with a esuoh\ * query :smile: getting terms aggregations that you'll have to reverse yourself :slight_smile:

Fun problem, Hope that helps