Word_delimiter and catenate_all doesnt work?

batmaci · March 3, 2016, 12:18pm

I have my mapping as below

{
"state": "open",
"settings": {
"index": {
"creation_date": "1457004137507",
"analysis": {
"filter": {
"my_word_delimiter": {
"catenate_all": "true",
"split_on_numerics": "true",
"split_on_case_change": "true",
"type": "word_delimiter",
"preserve_original": "true"
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"standard"
,
"lowercase"
,
"my_word_delimiter"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
},
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "zAT_MukSSTyIVKXQz-7YKw",
"version": {
"created": "2020099"
}
}
},
"mappings": {
"Product": {
"properties": {
"ProductCode": {
"index": "not_analyzed",
"type": "string"
},
"id": {
"index": "no",
"store": true,
"type": "integer"
},
"Name": {
"store": true,
"type": "string"
},
"ShortDescription": {
"type": "string"
}
}
}

I have product with name "Brother TN-2000 Toner Black" and when I use following query with "tn 2000" or "tn-2000", I am getting it in the search result. but when I use "tn2000", it will not return me anything. I though that word_delimiter and catenate_all should give me expected inverted index. what am I doing wrong? can you please help me?

{"query":{"bool":{"should":[{"multi_match":{"type":"best_fields","query":"tn 2000","fields":["Name^7","ShortDescription^6"]}}]}}}

when I check the analyzer with the following curl query,
curl -XGET "localhost:9200/myIndex/_analyze?analyzer=my_analyzer&pretty
=true" -d 'Brother TN-2000 Toner Black'

it returns me

{
"tokens" : [ {
"token" : "'brother",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
}, {
"token" : "brother",
"start_offset" : 1,
"end_offset" : 8,
"type" : "word",
"position" : 0
} ]
}
curl: (6) Could not resolve host: TN-2000
curl: (6) Could not resolve host: Toner
curl: (6) Could not resolve host: Black'

jimczi · March 3, 2016, 2:56pm

I tried your example and it works for me:

curl -XGET "localhost:9200/test/_analyze?analyzer=my_analyzer&pretty=true" -d 'Brother TN-2000 Toner Black'
{
  "tokens" : [ {
    "token" : "brother",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "tn-2000",
    "start_offset" : 8,
    "end_offset" : 15,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "tn",
    "start_offset" : 8,
    "end_offset" : 10,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "tn2000",
    "start_offset" : 8,
    "end_offset" : 15,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "2000",
    "start_offset" : 11,
    "end_offset" : 15,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "toner",
    "start_offset" : 16,
    "end_offset" : 21,
    "type" : "word",
    "position" : 3
  }, {
    "token" : "black",
    "start_offset" : 22,
    "end_offset" : 27,
    "type" : "word",
    "position" : 4
  } ]
}

Are you sure that you copy/paste the curl command you've executed ?
The problem seems related to your curl command (could not resolve host...), try using double quotes instead ?

The field 'Name' is not using your custom analyzer, is it intended ?
Could you also indent your example correctly.

batmaci · March 3, 2016, 3:19pm

yea, double quotes made difference and i got results like you with curl but it still didnt solve my main problem. So this curl query shows me that in my inverted index and all searches are executed in inverted index right? if I have tn2000 and my multimatch query above should find the tn2000 in the inverted index but why It doesnt work? Do you have any idea?

when I use head plugin and under Actions menu-> Test Analyzer, I type Brother TN-2000 Toner Black and it will return me only brother, tn, 2000, toner and black but not tn2000 as shown in the image. it looks like it uses standart analyzer

jimczi · March 3, 2016, 3:23pm

Yes it's because you don't map your field with your analyzer. You should define your field like this:

"Name": {
  "store": true,
  "type": "string",
  "index_analyzer": "my_analyzer"
},

You'll need to reindex your data to see the changes...

batmaci · March 3, 2016, 4:54pm

This is how I attempted it

{
"Product": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"store": true,
"index": "no",
"type": "integer"
},
"Name": {
"index_analyzer": "my_analyzer",
"type": "string"
},
"ShortDescription": {
"index_analyzer": "my_analyzer",
"type": "string"
}
}
}
}

but it returns me http 400 with following message

{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Mapping definition for [Name] has unsupported parameters: [index_analyzer : my_analyzer]"}],"type":"mapper_parsing_exception","reason":"Mapping definition for [Name] has unsupported parameters: [index_analyzer : my_analyzer]"},"status":400}

batmaci · March 3, 2016, 5:04pm

analyzer instead of using index_analyzer as below worked fine. what is the difference?

  "Name": {
    "analyzer": "my_analyzer",
    "type": "string"
  },

jimczi · March 3, 2016, 5:05pm

Yes sorry you need to define a search_analyzer and an analyzer for this.

"Name": {
  "store": true,
  "type": "string",
  "analyzer": "my_analyzer",
  "search_analyzer": "standard"
}
```

batmaci · March 4, 2016, 11:04am

it worked like a charm. thanks for your help.

Topic		Replies	Views
Issue with using word delimiter Elasticsearch	1	587	July 6, 2017
Why word_delimiter doesn's work on my index? Elasticsearch	5	345	March 17, 2021
Word Delimiter Filter Elasticsearch	1	285	July 6, 2017
Word_delimiter behaviour using match query with operator and Elasticsearch	1	203	September 26, 2022
Need help with mapping Elasticsearch	1	333	July 6, 2017

Word_delimiter and catenate_all doesnt work?

Related topics