Standard TokenizerFactory deprecated in Elasticsearch Java API v7?

Currently performing an Elasticsearch upgrade from v1.7 to v7.1. While refactoring our Java code, I came across this Elasticsearch v1.7 code:

    private static final TokenizerFactory STANDARD_TOKENIZER_FACTORY = 
        PreBuiltTokenizers.STANDARD.getTokenizerFactory(Version.V_1_7_3);

    org.apache.lucene.analysis.Tokenizer tokenizer = 
            STANDARD_TOKENIZER_FACTORY.create(new StringReader(terms.toLowerCase()));

Can this be recreated using the Java Elasticsearch 7.1 APIs? I've tried to no avail. Looks likePreBuiltTokenizers.STANDARD.getTokenizerFactory() has been deprecated in v7.

Hey,

can you maybe explain what you are trying to do and why you need access to those classes? Are you writing a plugin like an AnalysisPlugin?

--Alex

In our Java code, we tokenize search terms then we categorize the tokens into 2 lists: high frequency tokens (like stopwords) and low frequency tokens. Then, we build a compound query and send it to Elasticsearch. In the query, the 2 frequencies have different settings applied to them (e.g different boosts etc). This is legacy code.

For example if the user searches for "the quick and fast brown fox", we'll break that up into ["the and"] and ["quick fast brown"], then create a query (that boosts low frequency tokens different than high frequency tokens) and send it to Elasticsearch.

This is just one example. That tokenizing logic that I have in my original post is in a common util method that is used in multiple places for the purpose of tokenizing strings.

You may want to read about the common terms query - but even more important, why it is deprecated in newer versions, that might help you to rethink your whole architecture in that regard.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.