Is there any way to filter the searched term from the results?
I'm using a standard tokeniser, the field "word" usually contains values like "green table", "big table", "yellow tables".. my aggregation will put the words "table" or "tables" at top as they're the most frequent.
In this example, I don't want the word "table" in the buckets list results.
By the way, regarding the query_string, whats the most efficient way (performance wise) to search for words that contain a word?
How to get full word tokens with Ngrams?
With a Ngram tokeniser here, the returned tokens would be "tab", "le " etc, can't aggregate on that as the buckets wouldn't make sense
Yep there was much dilemma a few days ago regarding that.. but in the example I gave, a keyword type would do buckets on "green table" instead on "green" and "table", or is there a way to achieve the same?
Wasn't sure if I tried, so I've done it again, it doesn't produce the wanted effect.
I know that using a standard analyser on a text field isn't really an optimal approach, but I didn't see any other way.
Thanks for the effort though.
There are lots of special characters in the words field
The words field can contain multiple words and every word has to be a separate "entity" (token)
That's pretty much why I need to use a standard analyser on a text field with fielddata turned on.
The goal is to count the number of occurrences of each word within all of the words fields on the cluster (that's the query above). I really gave a lot of consideration for other options, researched a lot, this seems like the only option for my use case.
On the other hand, do you have any ideas how I could convert the query above to count every word within the words field and sort them based on the count number?
The current query produces a doc_count, which is the number of documents that contain the word, but some documents contain a word multiple times, so it isn't very precise.
Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.
A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.