Indexing and searching for string '?'

Hello,

when trying a match query on a string field to match the string '???' i am
getting nothing back from elasticsearch.

It seems like the standard analyzer is just stripping this string out when
tokenizing. probably because its treating a ? as a end of word and
filtering it out?

when doing _analyze?analyzer=standard&pretty' -d 'this is a ??? test'

I get back the response below which seems to confirm that. Is there any
way where I could still be filtering out "?" at the end of words, but if
there are multiple '??' it doesn't strip them?

{

"tokens" : [ {

"token" : "this",

"start_offset" : 0,

"end_offset" : 4,

"type" : "<ALPHANUM>",

"position" : 1

}, {

"token" : "is",

"start_offset" : 5,

"end_offset" : 7,

"type" : "<ALPHANUM>",

"position" : 2

}, {

"token" : "a",

"start_offset" : 8,

"end_offset" : 9,

"type" : "<ALPHANUM>",

"position" : 3

}, {

"token" : "test",

"start_offset" : 15,

"end_offset" : 19,

"type" : "<ALPHANUM>",

"position" : 4

} ]

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALdNed%2BCGeR_92B%3DH%2BnS3FY%3DuiXH0Q6ShJV_Jg_awbQ2bH3sbQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Exactly default standard analyzer is meant for text analysis. ??? is not a word either a number so it's removed.
If you need to analyze that, you should try to use another analyzer like: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-whitespace-analyzer.html#analysis-whitespace-analyzer

--
David Pilato | Technical Advocate | elasticsearch.com
david.pilato@elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 27 octobre 2014 à 12:53:01, Mike Topper (topper@gmail.com) a écrit:

Hello,

when trying a match query on a string field to match the string '???' i am getting nothing back from elasticsearch.

It seems like the standard analyzer is just stripping this string out when tokenizing. probably because its treating a ? as a end of word and filtering it out?

when doing _analyze?analyzer=standard&pretty' -d 'this is a ??? test'

I get back the response below which seems to confirm that. Is there any way where I could still be filtering out "?" at the end of words, but if there are multiple '??' it doesn't strip them?

{

"tokens" : [ {

"token" : "this",

"start_offset" : 0,

"end_offset" : 4,

"type" : "<ALPHANUM>",

"position" : 1

}, {

"token" : "is",

"start_offset" : 5,

"end_offset" : 7,

"type" : "<ALPHANUM>",

"position" : 2

}, {

"token" : "a",

"start_offset" : 8,

"end_offset" : 9,

"type" : "<ALPHANUM>",

"position" : 3

}, {

"token" : "test",

"start_offset" : 15,

"end_offset" : 19,

"type" : "<ALPHANUM>",

"position" : 4

} ]

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALdNed%2BCGeR_92B%3DH%2BnS3FY%3DuiXH0Q6ShJV_Jg_awbQ2bH3sbQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.544e3533.216231b.91d6%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.