Escaping Special Characters in Wildcard Query


(dawi) #1

Hi,

my question is how to escape special characters in a wildcard query.

The elasticsearch documentation says that "The wildcard query maps to
lucene WildcardQuery".
http://www.elasticsearch.org/guide/reference/query-dsl/wildcard-query.html

The Lucene documentation says that there is the following list of special
characters:

I have tried every form of escaping I can imagine but I was not able to
search for * and ? using a wildcard query.

Anybody any hint or is it simply not possible?

Regards, Daniel


Use wildcards to search in Kibana
(Clinton Gormley) #2

Hi Dawi

The elasticsearch documentation says that "The wildcard query maps to
lucene WildcardQuery".
http://www.elasticsearch.org/guide/reference/query-dsl/wildcard-query.html

The Lucene documentation says that there is the following list of
special characters:

These special characters apply to the query_string/field query, not to
the wildcard query. The only special characters in the wildcard query
are * and ?

I have tried every form of escaping I can imagine but I was not able
to search for * and ? using a wildcard query.

I'm guessing that the field that you are trying to search against is
analyzed with the standard analyzer? In which case, most punctuation is
removed, so characters like * will not exist in your terms, and thus
won't be searchable

Depending on what your data is, it make make sense to set your field to
{ index: not_analyzed}

clint


(dawi) #3

Hi Clint,

thanks for your answer!

I am not using the standard analyzer, instead I am using the
following analyzer configuration for the index:

index:
analysis:
analyzer:
default:
tokenizer : keyword
filter : lowercase

The following script may help to understand and reproduce my problems:

#!/bin/sh

curl -XPUT http://localhost:9200/index/type/1 -d '{ "name": "010" }'
curl -XPUT http://localhost:9200/index/type/2 -d '{ "name": "0*0" }'

echo
echo "###############################################################"
echo "term-query: one result, ok, works as expected"
curl -XGET http://localhost:9200/index/type/_search?pretty=true -d '{
"query" : { "term" : { "name" : "0*0" } }
}'

echo "###############################################################"
echo "wildcard-query: two results, ok, works as expected"
curl -XGET http://localhost:9200/index/type/_search?pretty=true -d '{
"query" : { "wildcard" : { "name" : "0*" } }
}'

echo "???????????????????????????????????????????????????????????????"
echo "wildcard-query: expecting one result, how can this be achieved???"
curl -XGET http://localhost:9200/index/type/_search?pretty=true -d '{
"query" : { "wildcard" : { "name" : "0\**" } }
}'

I have tried nearly any forms of escaping, and of course this could be a
problem of shell escape sequences.
But I don't think it is because I have the same problems using the Java API
with wildcardQuery("name", "0*0").

I am afraid, but is it possible that the answer is that I cannot search for

  • and ? using wildcard queries?

Best Regards,
Daniel

Am Mittwoch, 9. November 2011 09:39:11 UTC+1 schrieb Clinton Gormley:

Hi Dawi

The elasticsearch documentation says that "The wildcard query maps to
lucene WildcardQuery".

http://www.elasticsearch.org/guide/reference/query-dsl/wildcard-query.html

The Lucene documentation says that there is the following list of
special characters:

These special characters apply to the query_string/field query, not to
the wildcard query. The only special characters in the wildcard query
are * and ?

I have tried every form of escaping I can imagine but I was not able
to search for * and ? using a wildcard query.

I'm guessing that the field that you are trying to search against is
analyzed with the standard analyzer? In which case, most punctuation is
removed, so characters like * will not exist in your terms, and thus
won't be searchable

Depending on what your data is, it make make sense to set your field to
{ index: not_analyzed}

clint


(Clinton Gormley) #4

Hiya

I am afraid, but is it possible that the answer is that I cannot
search for * and ? using wildcard queries?

Correct.

But you can use the query_string/field queries with * to achieve what
you want.

Those queries DO understand lucene query syntax

clint

Best Regards,
Daniel

Am Mittwoch, 9. November 2011 09:39:11 UTC+1 schrieb Clinton Gormley:
Hi Dawi

    > The elasticsearch documentation says that "The wildcard
    query maps to
    > lucene WildcardQuery".
    >
    http://www.elasticsearch.org/guide/reference/query-dsl/wildcard-query.html
    
    > The Lucene documentation says that there is the following
    list of
    > special characters:
    > + - && || ! ( ) { } [ ] ^ " ~ * ? : \
    > and that they can be escaped using the \ before the
    character.
    >
    http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Escaping%
    > 20Special%20Characters
    
    These special characters apply to the query_string/field
    query, not to
    the wildcard query.  The only special characters in the
    wildcard query
    are * and ?
    
    > I have tried every form of escaping I can imagine but I was
    not able
    > to search for * and ? using a wildcard query.
    
    I'm guessing that the field that you are trying to search
    against is
    analyzed with the standard analyzer? In which case, most
    punctuation is
    removed, so characters like * will not exist in your terms,
    and thus
    won't be searchable
    
    Depending on what your data is, it make make sense to set your
    field to
    { index: not_analyzed}
    
    clint

(dawi) #5

Hi Clint,

thanks for this information. Then I will use the query_string query for my
purpose.

Until I don't use the wildcard as first character this search behaves
exactly as I want.

A search for 0*0 matches document 00.
A search for 0*
matches document 0*0.
A search for 10 delivers document 010.
But
A search for *0 delivers both documents 010 and 00.
A search for * delivers both documents 010 and 0
0.

Is this behavior intended? Or is this a bug? Or am I doing something wrong?

curl -XPUT http://localhost:9200/index/type/1 -d '{ "name": "010" }'
curl -XPUT http://localhost:9200/index/type/2 -d '{ "name": "0*0" }'

echo
echo "###############################################################"
echo "wildcard-query: one result, ok, works as expected"
curl -XGET http://localhost:9200/index/type/_search?pretty=true -d '{
"query" : { "query_string" : {
"default_field" : "name",
"query" : "0\*0"
} }
}'

echo
echo "###############################################################"
echo "wildcard-query: one result, ok, works as expected"
curl -XGET http://localhost:9200/index/type/_search?pretty=true -d '{
"query" : { "query_string" : {
"default_field" : "name",
"query" : "0\**"
} }
}'

echo
echo "###############################################################"
echo "wildcard-query: one result, ok, works as expected"
curl -XGET http://localhost:9200/index/type/_search?pretty=true -d '{
"query" : { "query_string" : {
"default_field" : "name",
"allow_leading_wildcard" : "true",
"query" : "*10"
} }
}'

echo
echo "???????????????????????????????????????????????????????????????"
echo "wildcard-query: one result, not ok, returns all documents"
curl -XGET http://localhost:9200/index/type/_search?pretty=true -d '{
"query" : { "query_string" : {
"default_field" : "name",
"allow_leading_wildcard" : "true",
"query" : "*\*0"
} }
}'

echo
echo "???????????????????????????????????????????????????????????????"
echo "wildcard-query: one result, not ok, returns all documents"
curl -XGET http://localhost:9200/index/type/_search?pretty=true -d '{
"query" : { "query_string" : {
"default_field" : "name",
"allow_leading_wildcard" : "true",
"query" : "*\**"
} }
}'

Best Regards,
Daniel


(dawi) #6

Hi clint,

in addition to the curl commands I have written a small java test
class: https://gist.github.com/1351559

Best regards,
Daniel


(system) #7