Searching Special Charactes like &%*@()!{} etc in 0.13


(B Rajan) #1

Hi,

How can we searh the special characters &%*@()!{} in Elastic Search
0.13.

For example:

{ "query" : { "query_string" : { "default_field" : "", "query" :
"elastic(","default_operator" : "OR","analyzer" :
"Standard","allow_leading_wildcard" :
true,"lowercase_expanded_terms" : true,"enable_position_increments" :
true,"fuzzy_prefix_length" : 0,"fuzzy_min_sim" : 0.5,"phrase_slop" :
0,"boost" : 1.0 } }, "from" : 0, "size" : 0, "explain" : false }

Here I am trying to search for a word "elastic(" with a open
paranthesis at the end. I used the escape character \ but still not
working.

Please help.


(Lukáš Vlček) #2

Hi,

It is important how you analyze text during indexing (and query time as
well). You can check with a new Analyzer API how your terms are handled with
your analyzers. The mentioned feature is in master, see
https://github.com/elasticsearch/elasticsearch/issues/issue/529 for more
details.

Providing the "&%@()!{}" is URL encoded to "%26%25%40()!%7b%7d".

Using default analyzer (standard)

curl -XGET 'http://localhost:9200/myindex/_analyze?text=%26%*%40()!{}'
{"tokens":[]}

Using keyword analyzer:

curl -XGET '
http://localhost:9200/myindex/_analyze?analyzer=keyword&text=%26%*%40()!{}
'
{"tokens":[{"token":"&%*@()!{}","start_offset":0,"end_offset":9,"type":"word","position":1}]}

Typically you want to use the same analysis process for both indexing and
query time.

So imagine the following:

start ES with defaults

curl -XDELETE http://localhost:9200/myindex
curl -XPUT http://localhost:9200/myindex/ -d
'{"index":{"number_of_shards":1,"number_of_replicas":0}}'
curl -XPUT 'http://localhost:9200/myindex/mytype/1?refresh=true' -d
'{"field1":"text("}'
curl -XGET http://localhost:9200/_search?refresh=true -d '{
"query":{"term":{"field1":"text("}}}'
curl -XGET http://localhost:9200/_search -d
'{"query":{"query_string":{"query":""text(""}}}'

The above sequence of calls to REST API do the following:

  1. delete the index
  2. create the index
  3. index one document (and refresh so that we can immediately search for it)
  4. use term APIhttp://localhost:4000/docs/elasticsearch/rest_api/query_dsl/term_query/to
    search for the exact match on "text("
  5. use query_string API to search on "text(" (note using phrase query syntax
    otherwise you get Lucene query parser exception)

Interestingly 4) does not find anything and 5) does find the document. Why?
Because "text(" is by default tokenized to "text".

curl -XGET 'http://localhost:9200/myindex/_analyze?text=text('
{"tokens":[{"token":"text","start_offset":0,"end_offset":4,"type":"","position":1}]}

So you have to set appropriate analyzers to be able to search on specific
characters. May be you want to use standard analyzer on your text and put
specific terms into a new filed which is not analyzed during indexing.

Also note that there are some issues in your query:

  1. default_field - why do you set it to empty string?
  2. Query DSL is case sensitive so note you used "Standard" analyzer instead
    of "standard"

Regards,
Lukas

On Wed, Nov 24, 2010 at 8:22 AM, B Rajan boopsraj@gmail.com wrote:

Hi,

How can we searh the special characters &%*@()!{} in Elastic Search
0.13.

For example:

{ "query" : { "query_string" : { "default_field" : "", "query" :
"elastic(","default_operator" : "OR","analyzer" :
"Standard","allow_leading_wildcard" :
true,"lowercase_expanded_terms" : true,"enable_position_increments" :
true,"fuzzy_prefix_length" : 0,"fuzzy_min_sim" : 0.5,"phrase_slop" :
0,"boost" : 1.0 } }, "from" : 0, "size" : 0, "explain" : false }

Here I am trying to search for a word "elastic(" with a open
paranthesis at the end. I used the escape character \ but still not
working.

Please help.


(system) #3