Case sensitivity


(Christopher Burkey-3) #1

Do I need to lower case my searches before I send them to Elasticsearch? I
would think for text and wildcard queries Elasticsearch search_analyzer
would take care of it.

Here is a test that searches for "TR" in uppercase and does not get results.
Lowercasing the query by hand seems to work but the search_analyzer should
have been applied in a consistent way.

set -o verbose #echo on
#set +o verbose #echo off

SERVER="http://localhost:9200/twitter"

curl -XDELETE $SERVER | python -mjson.tool

curl -XPOST $SERVER -d '
{"index":
{ "number_of_shards": 1,
"analysis": {
"filter": {
"snowball": {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": { "a2" : {
"type":"custom",
"tokenizer": "standard",
"filter": ["lowercase", "snowball"]
}
}
}
}
}
}' | python -mjson.tool

sleep 1

curl -XPUT $SERVER/tweet/_mapping -d '{
"tweet" : {
"properties" : {
"message" : {"type" : "string",
"analyzer":"a2","include_in_all":"true"},
"user": {"type":"string"}
}
}}' | python -mjson.tool

sleep 1

curl -XPUT $SERVER/tweet/1 -d '{ "user": "kimchy", "message": "Trying out
searching teaching, so far so good?" }' | python -mjson.tool

sleep 1

curl -XGET $SERVER/tweet/_search?q=message:teach | python -mjson.tool

sleep 1

curl -XPOST $SERVER/tweet/_search -d '{
"query":
{
"wildcard" : {
"message" : "TR*"
}
}
}' | python -mjson.tool

echo "Should have a hit"


(Ivan Brusic) #2

According to the docs, a wildcard query is not analyzed:
http://www.elasticsearch.org/guide/reference/query-dsl/wildcard-query.html

Since your terms are analyzed with a standard analyzer (lower cased),
your query must also be lower-cased.

--
Ivan

On Thu, Aug 25, 2011 at 10:46 AM, Christopher Burkey
cburkey@openedit.org wrote:

Do I need to lower case my searches before I send them to Elasticsearch? I
would think for text and wildcard queries Elasticsearch search_analyzer
would take care of it.

Here is a test that searches for "TR" in uppercase and does not get results.
Lowercasing the query by hand seems to work but the search_analyzer should
have been applied in a consistent way.

set -o verbose #echo on
#set +o verbose #echo off

SERVER="http://localhost:9200/twitter"

curl -XDELETE $SERVER | python -mjson.tool

curl -XPOST $SERVER -d '
{"index":
{ "number_of_shards": 1,
"analysis": {
"filter": {
"snowball": {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": { "a2" : {
"type":"custom",
"tokenizer": "standard",
"filter": ["lowercase", "snowball"]
}
}
}
}
}
}' | python -mjson.tool

sleep 1

curl -XPUT $SERVER/tweet/_mapping -d '{
"tweet" : {
"properties" : {
"message" : {"type" : "string",
"analyzer":"a2","include_in_all":"true"},
"user": {"type":"string"}
}
}}' | python -mjson.tool

sleep 1

curl -XPUT $SERVER/tweet/1 -d '{ "user": "kimchy", "message": "Trying out
searching teaching, so far so good?" }' | python -mjson.tool

sleep 1

curl -XGET $SERVER/tweet/_search?q=message:teach | python -mjson.tool

sleep 1

curl -XPOST $SERVER/tweet/_search -d '{
"query":
{
"wildcard" : {
"message" : "TR*"
}
}
}' | python -mjson.tool

echo "Should have a hit"


(Shay Banon) #3

Check the text family of queries, they might do what you are after:
http://www.elasticsearch.org/guide/reference/query-dsl/text-query.html.

On Fri, Aug 26, 2011 at 1:47 AM, Ivan Brusic ivan@brusic.com wrote:

According to the docs, a wildcard query is not analyzed:
http://www.elasticsearch.org/guide/reference/query-dsl/wildcard-query.html

Since your terms are analyzed with a standard analyzer (lower cased),
your query must also be lower-cased.

--
Ivan

On Thu, Aug 25, 2011 at 10:46 AM, Christopher Burkey
cburkey@openedit.org wrote:

Do I need to lower case my searches before I send them to Elasticsearch?
I
would think for text and wildcard queries Elasticsearch search_analyzer
would take care of it.

Here is a test that searches for "TR" in uppercase and does not get
results.
Lowercasing the query by hand seems to work but the search_analyzer
should
have been applied in a consistent way.

set -o verbose #echo on
#set +o verbose #echo off

SERVER="http://localhost:9200/twitter"

curl -XDELETE $SERVER | python -mjson.tool

curl -XPOST $SERVER -d '
{"index":
{ "number_of_shards": 1,
"analysis": {
"filter": {
"snowball": {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": { "a2" : {
"type":"custom",
"tokenizer": "standard",
"filter": ["lowercase", "snowball"]
}
}
}
}
}
}' | python -mjson.tool

sleep 1

curl -XPUT $SERVER/tweet/_mapping -d '{
"tweet" : {
"properties" : {
"message" : {"type" : "string",
"analyzer":"a2","include_in_all":"true"},
"user": {"type":"string"}
}
}}' | python -mjson.tool

sleep 1

curl -XPUT $SERVER/tweet/1 -d '{ "user": "kimchy", "message": "Trying out
searching teaching, so far so good?" }' | python -mjson.tool

sleep 1

curl -XGET $SERVER/tweet/_search?q=message:teach | python -mjson.tool

sleep 1

curl -XPOST $SERVER/tweet/_search -d '{
"query":
{
"wildcard" : {
"message" : "TR*"
}
}
}' | python -mjson.tool

echo "Should have a hit"


(system) #4