Can't find a value that is the letter 'a' - Please Read

There is probably some reason for this. We don't understand what that
reason might be and it has caused some ES adoption "controversy" at our
shop.

The Problem:

A JSON document stored with value of the single letter 'a' cannot be found
in a search. Eg, "visitorDisplayName" : "a"

Stay with me here...

The Recipe:

curl -XPUT http://localhost:9200/visitors/logEntry/_mapping -d'
{
"logEntry": {
"properties": {
"deleted": {
"type": "boolean"
},
"hostDisplayName": {
"type": "string"
},
"isEmployee": {
"type": "boolean"
},
"locationID": {
"type": "string"
},
"passIdentifier": {
"type": "string"
},
"timeIn": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ssZZ"
},
"timeOut": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ssZZ"
},
"visitorCompanyName": {
"type": "string"
},
"visitorDisplayName": {
"type": "string"
},
"visitorPhoneNumber": {
"type": "string"
}
}
}
}
'

No problems here. So let's put something in there:

curl -XPOST http://localhost:9200/visitors/logEntry -d'
*
{"timeOut":null,"visitorPhoneNumber":null,"timeIn":"2013-07-16T18:18:23-0700","isEmployee":true,"visitorDisplayName":"a","hostDisplayName":null,"logEntryID":null,"hostPhoneNumber":null,"visitorFirstName":null,"hostLastName":null,"hostEmailID":null,"hostFirstName":null,"visitorLastName":null,"locationID":"g9Xc-E-3TNmbyHBg8y_ctg","visitorCompanyName":null,"deleted":false,"passIdentifier":"10ED9802-F24C-4D42-A7E3-B78493C19DDC"}
*
'
All good. Standard stuff.

Here's the thing: no query can retrieve this document via *
"visitorDisplayName":"a"*... not via _search?q=visitorDisplayName:a

Nor this POSTed one:

{
"query": {
"term" : {
"visitorDisplayName" : "a"
}
}
}

Now...before you tell me how ridiculous that is...hang on...because if
visitorDisplayName is "b" or "c" or "d" or...this actually does what one
expects.

The question is...why/how can this happen?

Any insight would be greatly appreciated and thanks in advance!

-K

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

try to set the visitorDisplayName to "not_analyzed"

There is probably some reason for this. We don't understand what that reason might be and it has caused some ES adoption "controversy" at our shop.

The Problem:

A JSON document stored with value of the single letter 'a' cannot be found
in a search. Eg, "visitorDisplayName" : "a"

Stay with me here...

The Recipe:

curl -XPUT http://localhost:9200/visitors/logEntry/_mapping -d'
{
"logEntry": {
"properties": {
"deleted": {
"type": "boolean"
},
"hostDisplayName": {
"type": "string"
},
"isEmployee": {
"type": "boolean"
},
"locationID": {
"type": "string"
},
"passIdentifier": {
"type": "string"
},
"timeIn": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ssZZ"
},
"timeOut": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ssZZ"
},
"visitorCompanyName": {
"type": "string"
},
"visitorDisplayName": {
"type": "string"
},
"visitorPhoneNumber": {
"type": "string"
}
}
}
}
'

No problems here. So let's put something in there:

curl -XPOST http://localhost:9200/visitors/logEntry -d'
*
{"timeOut":null,"visitorPhoneNumber":null,"timeIn":"2013-07-16T18:18:23-0700","isEmployee":true,"visitorDisplayName":"a","hostDisplayName":null,"logEntryID":null,"hostPhoneNumber":null,"visitorFirstName":null,"hostLastName":null,"hostEmailID":null,"hostFirstName":null,"visitorLastName":null,"locationID":"g9Xc-E-3TNmbyHBg8y_ctg","visitorCompanyName":null,"deleted":false,"passIdentifier":"10ED9802-F24C-4D42-A7E3-B78493C19DDC"}
*
'
All good. Standard stuff.

Here's the thing: no query can retrieve this document via *
"visitorDisplayName":"a"*... not via _search?q=visitorDisplayName:a

Nor this POSTed one:

{
"query": {
"term" : {
"visitorDisplayName" : "a"
}
}
}

Now...before you tell me how ridiculous that is...hang on...because if
visitorDisplayName is "b" or "c" or "d" or...this actually does what one
expects.

The question is...why/how can this happen?

Any insight would be greatly appreciated and thanks in advance!

-K

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

the reason why this happens: Your havent defined a custom analyzer on your
string fields, so the standard analyzer is chosen by elasticsearch
automatically. The standard analyzer contains a stop word filter, which
removes common words from a field before the field is indexed. Words like
'this', 'a', 'an', 'the' are removed in that process. And this is exactly
the reason you cannot search for it anymore, as they are not stored in the
dataset which is searched. On more information about how to change your
analysis process, take a look at the documentation., starting here
http://www.elasticsearch.org/guide/reference/index-modules/analysis/

Setting the visitorDisplayName to not_analyzed will result in your field
not being tokenized and split at all, which is most likely not what you
want, but mileage may vary...

--Alex

On Wed, Jul 17, 2013 at 5:57 AM, Kai Cherry rnsksoft@gmail.com wrote:

There is probably some reason for this. We don't understand what that
reason might be and it has caused some ES adoption "controversy" at our
shop.

The Problem:

A JSON document stored with value of the single letter 'a' cannot be found
in a search. Eg, "visitorDisplayName" : "a"

Stay with me here...

The Recipe:

curl -XPUT http://localhost:9200/visitors/logEntry/_mapping -d'
{
"logEntry": {
"properties": {
"deleted": {
"type": "boolean"
},
"hostDisplayName": {
"type": "string"
},
"isEmployee": {
"type": "boolean"
},
"locationID": {
"type": "string"
},
"passIdentifier": {
"type": "string"
},
"timeIn": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ssZZ"
},
"timeOut": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ssZZ"
},
"visitorCompanyName": {
"type": "string"
},
"visitorDisplayName": {
"type": "string"
},
"visitorPhoneNumber": {
"type": "string"
}
}
}
}
'

No problems here. So let's put something in there:

curl -XPOST http://localhost:9200/visitors/logEntry -d'
*
{"timeOut":null,"visitorPhoneNumber":null,"timeIn":"2013-07-16T18:18:23-0700","isEmployee":true,"visitorDisplayName":"a","hostDisplayName":null,"logEntryID":null,"hostPhoneNumber":null,"visitorFirstName":null,"hostLastName":null,"hostEmailID":null,"hostFirstName":null,"visitorLastName":null,"locationID":"g9Xc-E-3TNmbyHBg8y_ctg","visitorCompanyName":null,"deleted":false,"passIdentifier":"10ED9802-F24C-4D42-A7E3-B78493C19DDC"}
*
'
All good. Standard stuff.

Here's the thing: no query can retrieve this document via *
"visitorDisplayName":"a"*... not via _search?q=visitorDisplayName:a

Nor this POSTed one:

{
"query": {
"term" : {
"visitorDisplayName" : "a"
}
}
}

Now...before you tell me how ridiculous that is...hang on...because if
visitorDisplayName is "b" or "c" or "d" or...this actually does what one
expects.

The question is...why/how can this happen?

Any insight would be greatly appreciated and thanks in advance!

-K

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.