Hi,
I'm looking to write a query that will return the "most frequently co-occurring terms given an input of one or more terms", though I'm quite unclear as to how I would structure such a query (or if it is even possible in Elasticsearch).
As an example, given the following documents:
{
"id": "1",
"tags": ["planet", "earth", "drawing", "illustration"]
},
{
"id": "2",
"tags": ["planet", "saturn", "drawing"]
},
{
"id": "3",
"tags": ["planet", "earth", "drawing", "illustration"]
},
{
"id": "4",
"tags": ["drawing"]
}
And, given the following input:
["planet"]
I would like to return the terms that occur the most often on documents that have the tag planet
as well as some statistics, such that the output would look like:
[
{"drawing": {"cooccurrence": 3, "total_doc_count": 4}},
{"earth": {"cooccurrence": 2, "total_doc_count": 2}},
...
]
So far, I believe this is totally possible within the Term Vectors API. However, I would like to do the same on array intersection, such that given the input:
["planet", "earth"]
I would like to return the same term frequencies, but for documents intersecting with both planet
and earth
(i.e. in this example, the term vectors would be filtered for only documents 1
and 3
:
[
{"drawing": {"cooccurrence": 2, "total_doc_count": 4}},
{"illustration": {"cooccurrence": 2, "total_doc_count": 2}},
...
]
Is this possible to achieve within Elasticsearch?
Thanks!
Charles