Different results querying the same field

Hello there! We are trying to tune our ES to get more relevant results, I'm
reading through all the lucene scoring process, and a few things I've
noticed and I kinda get it:

Our _all field is composed of three fields (artistName, albumName, songName)

When we search for something like:

{
"query": {
"query_string": {
"query": "bad religion"
}
}
}

versus

{
"explain": true,
"query": {
"query_string": {
"query": "bad religion",
"fields" : ["songName","artistName","albumName"]
}
}
}

We get very different results. The all field has much more relevant stuff.
I would assume that the termNorms and idfs are better in one field
containing everything versus multiple fields. I'm a bit worried that we
have such different results, but in the end if we have to use the _all I
don't mind.

Now this is a bit worrisome:

{
"explain": true,
"query": {
"query_string": {
"query": "bad religion"
}
}
}

vs

{
"explain": true,
"query": {
"query_string": {
"query": "bad religion",
"fields" : ["_all"]
}
}
}

return different results. To prove I'm pasting the explain of both queries
bellow. They are pretty much the same, but they differ in termFreq (aren't
we searching in the exact field for both?), and the results are impacted on
that as well, I'm getting different albums from bad religion depending on
the query I'm using.

Is this expected?

Regards

{
value: 7.253157
description: sum of:
details: [
{
value: 2.589271
description: weight(_all:bad in 1093577), product of:
details: [
{
value: 0.5974825
description: queryWeight(_all:bad), product of:
details: [
{
value: 6.933816
description: idf(docFreq=14942, maxDocs=5642368)
}
{
value: 0.08616936
description: queryNorm
}
]
}
{
value: 4.333635
description: fieldWeight(_all:bad in 1093577), product of:
details: [
{
value: 1
description: tf(termFreq(_all:bad)=1)
}
{
value: 6.933816
description: idf(docFreq=14942, maxDocs=5642368)
}
{
value: 0.625
description: fieldNorm(field=_all, doc=1093577)
}
]
}
]
}
{
value: 4.663886
description: weight(_all:religion in 1093577), product of:
details: [
{
value: 0.80188185
description: queryWeight(_all:religion), product of:
details: [
{
value: 9.3058815
description: idf(docFreq=1393, maxDocs=5642368)
}
{
value: 0.08616936
description: queryNorm
}
]
}
{
value: 5.816176
description: fieldWeight(_all:religion in 1093577), product of:
details: [
{
value: 1
description: tf(termFreq(_all:religion)=1)
}
{
value: 9.3058815
description: idf(docFreq=1393, maxDocs=5642368)
}
{
value: 0.625
description: fieldNorm(field=_all, doc=1093577)
}
]
}
]
}
]
}
}

===============================================

{
value: 7.190833
description: sum of:
details: [
{
value: 2.5919292
description: weight(_all:bad in 890369), product of:
details: [
{
value: 0.6003741
description: queryWeight(_all:bad), product of:
details: [
{
value: 6.907504
description: idf(docFreq=18253, maxDocs=6713585)
}
{
value: 0.086916216
description: queryNorm
}
]
}
{
value: 4.31719
description: fieldWeight(_all:bad in 890369), product of:
details: [
{
value: 1
description: tf(termFreq(_all:bad)=1)
}
{
value: 6.907504
description: idf(docFreq=18253, maxDocs=6713585)
}
{
value: 0.625
description: fieldNorm(field=_all, doc=890369)
}
]
}
]
}
{
value: 4.5989037
description: weight(_all:religion in 890369), product of:
details: [
{
value: 0.7997193
description: queryWeight(_all:religion), product of:
details: [
{
value: 9.201036
description: idf(docFreq=1841, maxDocs=6713585)
}
{
value: 0.086916216
description: queryNorm
}
]
}
{
value: 5.7506475
description: fieldWeight(_all:religion in 890369), product of:
details: [
{
value: 1
description: tf(termFreq(_all:religion)=1)
}
{
value: 9.201036
description: idf(docFreq=1841, maxDocs=6713585)
}
{
value: 0.625
description: fieldNorm(field=_all, doc=890369)
}
]
}
]
}
]
}
}

--

These queries should be the same. Judging from the explain output, it looks
like these two queries are searching two different sets of documents. Did
you continue indexing while running these searches? If not, could you
repeat the test by running the first query several times to make sure that
it returns the same result every time? Could you create a repro for this
issue?

On Tuesday, November 27, 2012 1:13:16 PM UTC-5, Vinicius Carvalho wrote:

Hello there! We are trying to tune our ES to get more relevant results,
I'm reading through all the lucene scoring process, and a few things I've
noticed and I kinda get it:

Our _all field is composed of three fields (artistName, albumName,
songName)

When we search for something like:

{
"query": {
"query_string": {
"query": "bad religion"
}
}
}

versus

{
"explain": true,
"query": {
"query_string": {
"query": "bad religion",
"fields" : ["songName","artistName","albumName"]
}
}
}

We get very different results. The all field has much more relevant stuff.
I would assume that the termNorms and idfs are better in one field
containing everything versus multiple fields. I'm a bit worried that we
have such different results, but in the end if we have to use the _all I
don't mind.

Now this is a bit worrisome:

{
"explain": true,
"query": {
"query_string": {
"query": "bad religion"
}
}
}

vs

{
"explain": true,
"query": {
"query_string": {
"query": "bad religion",
"fields" : ["_all"]
}
}
}

return different results. To prove I'm pasting the explain of both queries
bellow. They are pretty much the same, but they differ in termFreq (aren't
we searching in the exact field for both?), and the results are impacted on
that as well, I'm getting different albums from bad religion depending on
the query I'm using.

Is this expected?

Regards

{
value: 7.253157
description: sum of:
details: [
{
value: 2.589271
description: weight(_all:bad in 1093577), product of:
details: [
{
value: 0.5974825
description: queryWeight(_all:bad), product of:
details: [
{
value: 6.933816
description: idf(docFreq=14942, maxDocs=5642368)
}
{
value: 0.08616936
description: queryNorm
}
]
}
{
value: 4.333635
description: fieldWeight(_all:bad in 1093577), product of:
details: [
{
value: 1
description: tf(termFreq(_all:bad)=1)
}
{
value: 6.933816
description: idf(docFreq=14942, maxDocs=5642368)
}
{
value: 0.625
description: fieldNorm(field=_all, doc=1093577)
}
]
}
]
}
{
value: 4.663886
description: weight(_all:religion in 1093577), product of:
details: [
{
value: 0.80188185
description: queryWeight(_all:religion), product of:
details: [
{
value: 9.3058815
description: idf(docFreq=1393, maxDocs=5642368)
}
{
value: 0.08616936
description: queryNorm
}
]
}
{
value: 5.816176
description: fieldWeight(_all:religion in 1093577), product of:
details: [
{
value: 1
description: tf(termFreq(_all:religion)=1)
}
{
value: 9.3058815
description: idf(docFreq=1393, maxDocs=5642368)
}
{
value: 0.625
description: fieldNorm(field=_all, doc=1093577)
}
]
}
]
}
]
}
}

===============================================

{
value: 7.190833
description: sum of:
details: [
{
value: 2.5919292
description: weight(_all:bad in 890369), product of:
details: [
{
value: 0.6003741
description: queryWeight(_all:bad), product of:
details: [
{
value: 6.907504
description: idf(docFreq=18253, maxDocs=6713585)
}
{
value: 0.086916216
description: queryNorm
}
]
}
{
value: 4.31719
description: fieldWeight(_all:bad in 890369), product of:
details: [
{
value: 1
description: tf(termFreq(_all:bad)=1)
}
{
value: 6.907504
description: idf(docFreq=18253, maxDocs=6713585)
}
{
value: 0.625
description: fieldNorm(field=_all, doc=890369)
}
]
}
]
}
{
value: 4.5989037
description: weight(_all:religion in 890369), product of:
details: [
{
value: 0.7997193
description: queryWeight(_all:religion), product of:
details: [
{
value: 9.201036
description: idf(docFreq=1841, maxDocs=6713585)
}
{
value: 0.086916216
description: queryNorm
}
]
}
{
value: 5.7506475
description: fieldWeight(_all:religion in 890369), product of:
details: [
{
value: 1
description: tf(termFreq(_all:religion)=1)
}
{
value: 9.201036
description: idf(docFreq=1841, maxDocs=6713585)
}
{
value: 0.625
description: fieldNorm(field=_all, doc=890369)
}
]
}
]
}
]
}
}

--