Word Count

shampoo · September 23, 2019, 8:38pm

Hi,

I am trying to do the following:

Perform a standard search on a series of documents with a given search term. (easy enough)
Count the amount of times another series of words appears within that result set.

The query is being performed on the "body_text" field.

Example:

Search for the word "cooking" and within the result set, count the amount of times the word "egg" appears.

Any ideas ?

One obvious solution is to perform two separate queries. But I am wondering if it's possible to do this in one call.

Thanks.

Mark_Harwood · September 24, 2019, 10:27am

Hi shampoo,
You can see how any number of arbitrary queries overlap using the adjacency_matrix aggregation.
A visualization of the results might look like this:

Kibana-32

The circles and lines are sized by the numbers of documents with at least one occurrence (not the number of repeated occurrences within documents).

The query that provides the information behind this:

GET reviews/_search
{
  "size": 0,
  "timeout": "30s",
  "query": {
	"bool": {
	  "must": [
		{
		  "match": {
			"comments": "cooking"
		  }
		},
		{
		  "bool": {
			"should": [
			  {
				"match": {
				  "comments": "egg"
				}
			  },
			  {
				"match": {
				  "comments": "chips"
				}
			  },
			  {
				"match": {
				  "comments": "ham"
				}
			  }
			]
		  }
		}
	  ]
	}
  },
  "aggs": {
	"my_food_matrix": {
	  "adjacency_matrix": {
		"filters": {
		  "cooking ": {
			"match": {
			  "comments": "cooking"
			}
		  },
		  "egg": {
			"match": {
			  "comments": "egg"
			}
		  },
		  "chips": {
			"match": {
			  "comments": "chips"
			}
		  },
		  "ham": {
			"match": {
			  "comments": "ham"
			}
		  }
		}
	  }
	}
  }
}

shampoo · September 25, 2019, 3:22pm

Hi,

Thanks so much for the reply. If I understand correctly, this would return the number of documents in which the query successfully finds a match.. I would need to know the actual word count within those documents.

Thanks again

J

Mark_Harwood · September 25, 2019, 3:35pm

That's more expensive and generally something we don't offer - it could be skewed heavily by one spammy document that does keyword-stuffing.
That said, the information is stored in the index and if you want to deep-dive on that you can use the explain API to get the TF (term frequency) for a word in a doc amongst other scoring factors.

system · October 23, 2019, 3:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Count the occurrence of words in ElasticSearch Elasticsearch elastic-stack-monitoring , elastic-stack-alerting , docker	5	3315	January 11, 2022
Aggregate a count of matched words among documents Elasticsearch	1	354	July 6, 2017
Hit count stats for search results Elasticsearch	3	543	July 5, 2017
Count of phrase matches per document Elasticsearch	2	3202	September 12, 2017
Get count of matches in documents Elasticsearch	1	357	July 6, 2017

Word Count

Related topics