Searching asterisk using query_match did not work as expected

Hi all,

I have an index which has data like this:

{c101=https://mail.google.com/mail/?ui=2, timestamp=1211033401}

And the column c101 uses standard analyzer by default.

Now I am querying data on this 'c101' column using the following query:

{"query" : {

          "query_string" : {

                            "c101" : { 

                                             "query" : 

""mail.google.*com""

                             }

           }

}}

The above query worked as expected with zero hits.

But when I change the position of the '*' to start and end positions like
this :

"query" : ""mail.google.com*"" OR "query" : ""*mail.google.com""

, I find the hits which was not expected.

Can someone tell me if I am doing something wrong or is there any other way
to achieve what I want?

Thanks in advance.

Regards,
Rakesh Kumar Rakshit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

And the question is exactly: "what do you want to achieve?"

Sounds like you need an analyzer that does not break your text into terms but keep it as is.
May be, you should use a keyword tokenizer or not analyze at all that field.

On a side note, you should avoid using wildcards in query. Prefer ngrams filters when building your index than wildcards.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 4 juin 2013 à 11:11, rakesh rakshit ihavethepotential@gmail.com a écrit :

Hi all,

I have an index which has data like this:

{c101=https://mail.google.com/mail/?ui=2, timestamp=1211033401}

And the column c101 uses standard analyzer by default.

Now I am querying data on this 'c101' column using the following query:

{"query" : {

          "query_string" : {

                            "c101" : { 

                                             "query" : "\"mail.google.*com\""

                             }

           }

}}

The above query worked as expected with zero hits.

But when I change the position of the '*' to start and end positions like this :

"query" : ""mail.google.com*"" OR "query" : ""*mail.google.com""

, I find the hits which was not expected.

Can someone tell me if I am doing something wrong or is there any other way to achieve what I want?

Thanks in advance.

Regards,
Rakesh Kumar Rakshit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David,

Thanks for the quick reply.

And the question is exactly: "what do you want to achieve?"

I want to index and search data which may contain asterisk(*). I read
somewhere that we can search for special characters by including the text
in escaped quotes like ""mytext"".

So I have two questions:

  1. What kind of analyzers/tokenizers I should so that the indexed data
    includes asterisk or any other special character?
  2. How should I search for those data containing asterisk(*) or some other
    special characters?

Right now I was trying to fire a query assuming the data contains
asterisk(*) as follows:

{"query" : {

          "query_string" : {

                            "c101" : { 

                                             "query" : 

""mail.google.*com""

                             }

           }

}}

This is not returning a hit as expected.

But if I fire ""mail.google.com*"" , it returns a hit.

I guess the search data is getting tokenized around asterisk. Means if I
queried for :

"query" : ""mail.google.*com""

It may be breaking it into two tokens 'mail.google.' and 'com' and it
searches for both tokens in 'mail.google.com' and fails.

But when I query ""mail.google.com*"" it may be breaking it into one
token 'mail.google.com' and matches a hit.

Hope I am able to explain my scenario a bit.

Thanks,
Rakesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.