I started using explain api for query_string but I guess in process I found
a bug (don't know if it really is a bug or intended behaviour of
query_string). This is going to be a long post, please be patient with me.
I'm using a doc:{name:"new delhi to goa",st:"goa"}
On using analyzer api for indexing I got these tokens:
{
"tokens" : [ {
"token" : "new",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
}, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
}, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
}, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
}, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
}, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
}, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi t",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi to",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi t",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi to",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi to ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi to g",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi to go",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "new delhi to goa",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "del",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
}, {
"token" : "delh",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
}, {
"token" : "delhi",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
}, {
"token" : "del",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
}, {
"token" : "delh",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
}, {
"token" : "delhi",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
}, {
"token" : "delhi ",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
}, {
"token" : "delhi t",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
}, {
"token" : "delhi to",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
}, {
"token" : "del",
"start_offset" : 4,
"end_offset" : 16,
"type" : "word",
"position" : 2
}, {
"token" : "delh",
"start_offset" : 4,
"end_offset" : 16,
"type" : "word",
"position" : 2
}, {
"token" : "delhi",
"start_offset" : 4,
"end_offset" : 16,
"type" : "word",
"position" : 2
}, {
"token" : "delhi ",
"start_offset" : 4,
"end_offset" : 16,
"type" : "word",
"position" : 2
}, {
"token" : "delhi t",
"start_offset" : 4,
"end_offset" : 16,
"type" : "word",
"position" : 2
}, {
"token" : "delhi to",
"start_offset" : 4,
"end_offset" : 16,
"type" : "word",
"position" : 2
}, {
"token" : "delhi to ",
"start_offset" : 4,
"end_offset" : 16,
"type" : "word",
"position" : 2
}, {
"token" : "delhi to g",
"start_offset" : 4,
"end_offset" : 16,
"type" : "word",
"position" : 2
}, {
"token" : "delhi to go",
"start_offset" : 4,
"end_offset" : 16,
"type" : "word",
"position" : 2
}, {
"token" : "delhi to goa",
"start_offset" : 4,
"end_offset" : 16,
"type" : "word",
"position" : 2
}, {
"token" : "to ",
"start_offset" : 10,
"end_offset" : 16,
"type" : "word",
"position" : 3
}, {
"token" : "to g",
"start_offset" : 10,
"end_offset" : 16,
"type" : "word",
"position" : 3
}, {
"token" : "to go",
"start_offset" : 10,
"end_offset" : 16,
"type" : "word",
"position" : 3
}, {
"token" : "to goa",
"start_offset" : 10,
"end_offset" : 16,
"type" : "word",
"position" : 3
}, {
"token" : "goa",
"start_offset" : 13,
"end_offset" : 16,
"type" : "word",
"position" : 4
} ]
}
Now, if I query like: "delhi to goa", I got this by search_analyzer:
{
"tokens" : [ {
"token" : "del",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 1
}, {
"token" : "delh",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 1
}, {
"token" : "delhi",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 1
}, {
"token" : "del",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
}, {
"token" : "delh",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
}, {
"token" : "delhi",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
}, {
"token" : "delhi ",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
}, {
"token" : "delhi t",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
}, {
"token" : "delhi to",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 1
}, {
"token" : "del",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "delh",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "delhi",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "delhi ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "delhi t",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "delhi to",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "delhi to ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "delhi to g",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "delhi to go",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "delhi to goa",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
}, {
"token" : "to ",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 2
}, {
"token" : "to g",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 2
}, {
"token" : "to go",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 2
}, {
"token" : "to goa",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 2
}, {
"token" : "goa",
"start_offset" : 9,
"end_offset" : 12,
"type" : "word",
"position" : 3
} ]
}
On using explain api, it gives me following:
{text=new delhi to goa,boostFactor=9.820192307,po=9.82}
510.39673 = custom score, product of:
510.39673 = script score function: composed of:
510.39673 = sum of:
371.12375 = max of:
371.12375 = sum of:
104.61707 = weight(text:del in 1003990) [PerFieldSimilarity], result of:
104.61707 = score(doc=1003990,freq=5.0 = termFreq=5.0
), product of:
0.43576795 = queryWeight, product of:
5.368244 = idf(docFreq=53067, maxDocs=4187328)
0.08117513 = queryNorm
240.0752 = fieldWeight in 1003990, product of:
2.236068 = tf(freq=5.0), with freq of:
5.0 = termFreq=5.0
5.368244 = idf(docFreq=53067, maxDocs=4187328)
20.0 = fieldNorm(doc=1003990)
133.24011 = weight(text:delh in 1003990) [PerFieldSimilarity], result of:
133.24011 = score(doc=1003990,freq=5.0 = termFreq=5.0
), product of:
0.49178073 = queryWeight, product of:
6.058268 = idf(docFreq=26616, maxDocs=4187328)
0.08117513 = queryNorm
270.934 = fieldWeight in 1003990, product of:
2.236068 = tf(freq=5.0), with freq of:
5.0 = termFreq=5.0
6.058268 = idf(docFreq=26616, maxDocs=4187328)
20.0 = fieldNorm(doc=1003990)
133.26657 = weight(text:delhi in 1003990) [PerFieldSimilarity], result of:
133.26657 = score(doc=1003990,freq=5.0 = termFreq=5.0
), product of:
0.49182954 = queryWeight, product of:
6.0588694 = idf(docFreq=26600, maxDocs=4187328)
0.08117513 = queryNorm
270.96088 = fieldWeight in 1003990, product of:
2.236068 = tf(freq=5.0), with freq of:
5.0 = termFreq=5.0
6.0588694 = idf(docFreq=26600, maxDocs=4187328)
20.0 = fieldNorm(doc=1003990)
139.27298 = max of:
139.27298 = weight(text:goa^20.0 in 1003990) [PerFieldSimilarity], result of:
139.27298 = score(doc=1003990,freq=3.0 = termFreq=3.0
), product of:
0.5712808 = queryWeight, product of:
20.0 = boost
7.037633 = idf(docFreq=9995, maxDocs=4187328)
0.004058757 = queryNorm
243.79076 = fieldWeight in 1003990, product of:
1.7320508 = tf(freq=3.0), with freq of:
3.0 = termFreq=3.0
7.037633 = idf(docFreq=9995, maxDocs=4187328)
20.0 = fieldNorm(doc=1003990)
1.0 = queryBoost
Though the above explain shows the results for:
del
delh
delhi
goa
But not getting results for other tokens which were generated by my search
analyzer. Why is it so ?
I have read that query_string uses query parser which is based on Lucene by
default. So, My guess is query_string is using a whitespace tokenizer after
my tokens are generated by search analyzer, am I correct ? How can I make
query_string to calculate score for all the tokens which are generated by
search_analyzer. Please correct me if I am wrong.
There is one more things which I noticed,
I'm using a query time boost on one of my doc field but it is not working
the way I thought it would work. In the above explain you can see, there is
a boost associated with goa but not with delhi, though but goa and delhi
are present in original doc. My guess for this is,
query_string applies boost to only terms where a term is a token of a user
typed string which is not analyzed by any analyzer because in the above
example, goa is kept as it is but delhi is being analyzed. Am I correct ?
Waiting a reply !!!
Thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/10dd24df-fe87-430d-8433-73df1acb1d0c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.