Building a custom tokenizer: "Could not find suitable constructor"

ndtreviv · September 22, 2017, 7:55pm

Yup. It's calling my tokenizer. But now it's revealed that my tokenizer is in fact crap!

Caused by: java.lang.IndexOutOfBoundsException
	at org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.append(CharTermAttributeImpl.java:131)
	at com.cameraforensics.elasticsearch.plugins.UrlTokenizer.incrementToken(UrlTokenizer.java:30)

Probably because - as there are no docs - I'm doing it wrong.

    @Override
    public boolean incrementToken() throws IOException {
        if (position >= tokens.size()) {
            return false;
        } else {
            termAtt.setEmpty().append(tokens.get(position), position, position);
            position++;
            return true;
        }
    }

tokens is a list of all permutations of index segmentation (as per this: Performance of doc_values field vs analysed field)

I'm not really sure what the two int values should be on CharTermAttribute#append, so I'm guessing - incorrectly.

Anyway, thanks for all of your help. I'll keep hacking!

Topic		Replies	Views
Error in custom provider, org.elasticsearch.common.inject.CreationException Elasticsearch	2	536	July 6, 2017
Error in custom provider, org.elasticsearch.common.inject.CreationException Elasticsearch	2	407	July 6, 2017
Custom Tokenizer won't uninstall Elasticsearch	8	2761	July 5, 2017
CreateIndexRequestBuilder on 0.16.2 failed Elasticsearch	3	278	July 6, 2017
Custom Analyzer Provider requires org.elasticsearch package Elasticsearch	3	281	July 6, 2017

Building a custom tokenizer: "Could not find suitable constructor"

Related topics