Indexing and searching on special characters?


(tsandstr) #1

I am having some troubles with how the indexing works on strings. I use normal mapping for strings and query_string for my queries.

I index the following:

curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{
"message": "A&B: This is just a test"
}'
curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{
"message": "A+B: This is just a test"
}'
curl -XPUT http://localhost:9200/twitter/tweet/3 -d '{
"message": "A-B: This is just a test"
}'
curl -XPUT http://localhost:9200/twitter/tweet/4 -d '{
"message": "AB: This is just a test"
}'
curl -XPUT http://localhost:9200/twitter/tweet/5 -d '{
"message": "A&C: This is just a test"
}'
curl -XPUT http://localhost:9200/twitter/tweet/6 -d '{
"message": "A+C: This is just a test"
}'
curl -XPUT http://localhost:9200/twitter/tweet/7 -d '{
"message": "A-C: This is just a test"
}'
curl -XPUT http://localhost:9200/twitter/tweet/8 -d '{
"message": "A
C: This is just a test"
}'

and I get the following results with my queries:

{'query': {'query_string': {'query': 'A+B', 'default_operator': 'AND', 'default_field': 'message'}}} {u'message': u'A&B: This is just a test'} {u'message': u'A+B: This is just a test'} {u'message': u'A-B: This is just a test'} {u'message': u'A*B: This is just a test'} --> I Expected to get only: A+B: This is just a test

{'query': {'query_string': {'query': 'A&B', 'default_operator': 'AND', 'default_field': 'message'}}}
{u'message': u'A&B: This is just a test'}
{u'message': u'A+B: This is just a test'}
{u'message': u'A-B: This is just a test'}
{u'message': u'A*B: This is just a test'}
--> I Expected to get only: A&B: This is just a test

{'query': {'query_string': {'query': 'A*B', 'default_operator': 'AND', 'default_field': 'message'}}}
--> Got no results at all, this is VERY confusing

{'query': {'query_string': {'query': 'A-B', 'default_operator': 'AND', 'default_field': 'message'}}}
{u'message': u'A&B: This is just a test'}
{u'message': u'A+B: This is just a test'}
{u'message': u'A-B: This is just a test'}
{u'message': u'A*B: This is just a test'}
--> I Expected to get only: A&B: This is just a test

{'query': {'query_string': {'query': 'A B', 'default_operator': 'AND', 'default_field': 'message'}}}
{u'message': u'A&B: This is just a test'}
{u'message': u'A+B: This is just a test'}
{u'message': u'A-B: This is just a test'}
{u'message': u'A*B: This is just a test'}

Why does the query A*B not return anything at all?
Is there a simple way to know how special characters are indexed by default?
It seems to me that A&B, A+B and A-B are all indexed the same way as A B, am I correct?

What about all other characters? How do they behave and how should I construct my query in order to get correct results? Escaping them does not seem to help.

Any help would be appreciated.

Thank you!

  • Tommy

(system) #2