Aalyzer issue - terms not getting tokenized on whitespace

(Preeti Jain) #1


We are using elasticsearch versionb 1.01.1. I did the following test for

PUT /test
"analysis": {
"analyzer": {
"type": "pattern",

GET /testallnext/_analyze?analyzer=whitespace&pretty=1&text='Preeti,Jain

I expected that the text will be broken into 2 token Preeti,Jain and test.
However, the result that I got was

"tokens": [
"token": "'preeti,jain test'",
"start_offset": 0,
"end_offset": 18,
"type": "word",
"position": 1

What is going wrong?

Our usecase requires special character to be indexed and searchable. For eg
if I index 2 documents ,one containing text "A+B" in one of the fields and
another one containing "A B" in the same field and then query ES for term
"A+B", I should get only one result.
What kind of tokenizer should be used for my purpose? I can't use
not_analyzed option on fields as they need to be anlyzed for phonetics and


(system) #2