Hi shampoo,
You can see how any number of arbitrary queries overlap using the adjacency_matrix aggregation.
A visualization of the results might look like this:
The circles and lines are sized by the numbers of documents with at least one occurrence (not the number of repeated occurrences within documents).
The query that provides the information behind this:
Thanks so much for the reply. If I understand correctly, this would return the number of documents in which the query successfully finds a match.. I would need to know the actual word count within those documents.
That's more expensive and generally something we don't offer - it could be skewed heavily by one spammy document that does keyword-stuffing.
That said, the information is stored in the index and if you want to deep-dive on that you can use the explain API to get the TF (term frequency) for a word in a doc amongst other scoring factors.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.