Elasticsearch 6.4.x does not register keyword tokenizer


(paranoiabla) #1

Hi guys,

in elasticsearch 5.6.x I see there a lot of tokenizers registered:

and in elasticsearch 6.4.2 after some debugging I see there's only the standard tokenizer. So my settings.json seems to be invalid:

{
 "analysis": {
"analyzer": {
  "raw": {
    "tokenizer": "keyword",
    "filter": [
      "lowercase"
    ]
  }
  }
  }
}

So I get this error:

Caused by: java.lang.IllegalArgumentException: Custom Analyzer [raw] failed to find tokenizer under name [keyword]
at org.elasticsearch.index.analysis.CustomAnalyzerProvider.build(CustomAnalyzerProvider.java:58)
at org.elasticsearch.index.analysis.AnalysisRegistry.processAnalyzerFactory(AnalysisRegistry.java:547)
at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:475)

Any idea how to fix this?


(Ryan Ernst) #2

The keyword tokenizer still exists, it just moved. A couple things to verify:

  • Your analysis section is underneath index settings?
  • You should have "type": "custom" in the same object as tokenizer/filter

(paranoiabla) #3

Hi @rjernst and thank you for your reply. Can you point me to the location where the keyword tokenizer is moved? I am adding a breakpoint at CustomAnalyzerProvider:56 and I inspect the tokenizers collection and I only see one value ('standard'). Meanwhile the same breakpoint with elasticsearch 5.6.x collection shows 14 values:
0 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14530} "standard" ->
1 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14531} "lowercase" ->
2 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14532} "pattern" ->
3 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14533} "thai" ->
4 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14534} "uax_url_email" ->
5 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14535} "PathHierarchy" ->
6 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14536} "path_hierarchy" ->
7 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14537} "classic" ->
8 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14538} "nGram" ->
9 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14539} "edgeNGram" ->
10 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14540} "letter" ->
11 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14541} "ngram" ->
12 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14542} "keyword" ->
13 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14543} "whitespace" ->
14 = {Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@14544} "edge_ngram" ->
So now I am wondering why I have most of the tokenizers missing.
To answer your questions:

  1. My json above is just as it is - analysis section is not under index section. I'm using spring-data-elasticsearch and the json is populated with an annotation on the POJO:

    @Setting(settingPath = "/elastic_setting.json")
    @Mapping(mappingPath = "/elastic_mapping.json")
     public class TestElasticsearchRequestProductDto extends ElasticsearchRequestProductDto {
    
     private static final long serialVersionUID = 42L;
    
    }
    
  2. I tried adding "type":"custom" to my tokenizer section but this had no effect whatsoever:
    {
    "analysis": {
    "analyzer": {
    "raw": {
    "tokenizer": "keyword",
    "type": "custom",
    "filter": [
    "lowercase"
    ]
    }
    }
    }
    }


(Ryan Ernst) #4

Sorry, I meant where they are registered in the code has moved. The debug point you have should still contain tokenizers other than standard. I'm not sure how spring-data-elasticsearch works, but is it by chance running elasticsearch embedded (meaning does it construct a Node instance itself)? I suspect it does not have elasticsearch modules setup correctly, and that the analysis common module (which is where all but the standard tokenizer moved to) is not being loaded. Running against a normal instance of elasticsearch 6.4.2 I am able to create a custom analyzer with the keyword tokenizer with no errors.


(paranoiabla) #5

OMG yes, you were totally right. I was spinning an embedded server with only the netty4 module. Once I added the CommonAnalysisPlugin it all works fine :slight_smile:
Thank you.