Confused on Analyzer and Query String Query


(Curt Hu) #1

I think I have posted something related with this before. But I still got confused on how analyzer and query_string query works.

I have an index with several million records, and I think all the fields are using the standard analyzer.

Q1. What is the exactly command to get my ES system level default analyzer, index default analyzer, or specific field analyzer??

Well, I think it's common since I've tried the following:
curl -XGET 'balbalbla/index_name/_analyze?text=www.google.com'
I got the result is:
{"tokens":[{"token":"www.google.com","start_offset":0,"end_offset":14,"type":"","position":1}]}
This means, "index_name"'s analyzer treats "www.google.com" as a single token. Please correct me if it's wrong.

So, this index has a field called "url" which is something like "www.google.com", "www.facebook.com"...
I am trying to use query_string query to fetch something:

If I am doing:
"query_string": {
"default_field": "url",
"analyzer": "whitespace", // or "analyzer" : "standard"
"query": "ipo.loyal3"
}

I got 2 hits, good, the field url of 2 hits are both "ipo.loyal3.com", and no matter the "whitespace" analyzer or "standard" analyzer I choose above.

But If I change the "query" to "ipo.loyal3.com" and try again. Here are the results I got the following results which I can not understand:

choose "standard" analyzer: I got around 200k hits, but only the top 2 hits are the real one "ip.loyal3.com"
choose "whitespace" analyzer: the query return very quickly with 0 hits.

I can not really understand this behavior, is the problem here related with ".com" here, or analyzer, or something special to query_string query?

Thanks very much!


(system) #2