Well one simple way, depending on the size of your data, is to create an index of bigrams by using a custom analyzer.
So for the input to analysis, you'd have
the great house at
and instead of breaking it up into words modify analysis to break it up into bigrams (two word tokens) using the shingle filter, like
[the great] [great house] [house at]
A prefix query on
house\ * here yields all the occurrences of house SPACE some word, then simply do a terms aggregration, and you'll see an ordering of all the bigrams as a facet, ordered by how frequently the terms occur in the search results. You may need to further filter this so you don't see every bigram in these documents.
"buckets" : [
"key" : "house rules",
"doc_count" : 52
"key" : "house sucks",
"doc_count" : 42
The OTHER direction though is a bit trickier. You may need to duplicate your data to another field to get a different view. You can to wildcard
* house queries, but they don't perform that well. Instead, you need to reverse the tokens BEFORE you do the prefix query. So in a completely separate field, you want to add a reverse filter to reverse the text AFTER shingling.
becomes for examining the other direction:
Then repeat the process for the other direction with a
esuoh\ * query getting terms aggregations that you'll have to reverse yourself
Fun problem, Hope that helps