Can this be recreated using the Java Elasticsearch 7.1 APIs? I've tried to no avail. Looks likePreBuiltTokenizers.STANDARD.getTokenizerFactory() has been deprecated in v7.
In our Java code, we tokenize search terms then we categorize the tokens into 2 lists: high frequency tokens (like stopwords) and low frequency tokens. Then, we build a compound query and send it to Elasticsearch. In the query, the 2 frequencies have different settings applied to them (e.g different boosts etc). This is legacy code.
For example if the user searches for "the quick and fast brown fox", we'll break that up into ["the and"] and ["quick fast brown"], then create a query (that boosts low frequency tokens different than high frequency tokens) and send it to Elasticsearch.
This is just one example. That tokenizing logic that I have in my original post is in a common util method that is used in multiple places for the purpose of tokenizing strings.
You may want to read about the common terms query - but even more important, why it is deprecated in newer versions, that might help you to rethink your whole architecture in that regard.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.