Aalyzer issue - terms not getting tokenized on whitespace

(Preeti Jain) #1


We are using elasticsearch versionb 1.01.1. I did the following test for

PUT /test
"analysis": {
"analyzer": {
"type": "pattern",

GET /testallnext/_analyze?analyzer=whitespace&pretty=1&text='Preeti,Jain

I expected that the text will be broken into 2 token Preeti,Jain and test.
However, the result that I got was

"tokens": [
"token": "'preeti,jain test'",
"start_offset": 0,
"end_offset": 18,
"type": "word",
"position": 1

What is going wrong?

Our usecase requires special character to be indexed and searchable. For eg
if I index 2 documents ,one containing text "A+B" in one of the fields and
another one containing "A B" in the same field and then query ES for term
"A+B", I should get only one result.
What kind of tokenizer should be used for my purpose? I can't use
not_analyzed option on fields as they need to be anlyzed for phonetics and


You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/381fc76a-d480-4cd0-ba97-01102071518b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

(system) #2