Aalyzer issue - terms not getting tokenized on whitespace


(Preeti Jain) #1

Hi,

We are using elasticsearch versionb 1.01.1. I did the following test for
analyzer

PUT /test
{
"settings":{
"analysis": {
"analyzer": {
"whitespace":{
"type": "pattern",
"pattern":"\\s+"
}
}
}
}
}

GET /testallnext/_analyze?analyzer=whitespace&pretty=1&text='Preeti,Jain
test'

I expected that the text will be broken into 2 token Preeti,Jain and test.
However, the result that I got was

{
"tokens": [
{
"token": "'preeti,jain test'",
"start_offset": 0,
"end_offset": 18,
"type": "word",
"position": 1
}
]
}

What is going wrong?

Our usecase requires special character to be indexed and searchable. For eg
if I index 2 documents ,one containing text "A+B" in one of the fields and
another one containing "A B" in the same field and then query ES for term
"A+B", I should get only one result.
What kind of tokenizer should be used for my purpose? I can't use
not_analyzed option on fields as they need to be anlyzed for phonetics and
synonyms.

Thanks,
Preeti

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/381fc76a-d480-4cd0-ba97-01102071518b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2