Elasticsearch/Lucene scoring broken?

I am seeing the same exact index and the same exact query return different
results, could someone please help me understand?

curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Barbara"
,"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John",
"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Nancy",
"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Louis",
"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Leighton Mark", "Louis Mark", "Kimberly Leighton", "Barbara
Leighton", "John Leighton", "Nancy Mark", "Lawrence Mark", "Leighton Sweet"]

I run the same excact code again and the results are different
*
*
*
*
curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Barbara"
,"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John",
"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Nancy",
"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Louis",
"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Nancy Mark", "Leighton Mark", "Lawrence Mark", "Louis Mark",
"Leighton Sweet", "Kimberly Leighton", "Barbara Leighton", "John Leighton"]

The first time the top results was "Leighton Mark" as it should be because
it matches both terms. The same query seconds later returns a different
search result.

Is scoring broken in Elasticsearch/Lucene?

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hard to say without seeing the explanation/scores. How many shards does
your index have? TFIDF are calculated per shard, which skews scores greatly
when you only have a few documents. You can experiment with the
dfs_query_then_fetch search type:
http://www.elasticsearch.org/guide/reference/api/search/search-type.html

--
Ivan

On Thu, Feb 28, 2013 at 7:15 PM, Bruno Miranda bru.miranda@gmail.comwrote:

I am seeing the same exact index and the same exact query return different
results, could someone please help me understand?

curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John",
"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Nancy"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Louis"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Leighton Mark", "Louis Mark", "Kimberly Leighton", "Barbara
Leighton", "John Leighton", "Nancy Mark", "Lawrence Mark", "Leighton Sweet"]

I run the same excact code again and the results are different
*
*
*
*
curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John",
"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Nancy"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Louis"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Nancy Mark", "Leighton Mark", "Lawrence Mark", "Louis Mark",
"Leighton Sweet", "Kimberly Leighton", "Barbara Leighton", "John Leighton"]

The first time the top results was "Leighton Mark" as it should be because
it matches both terms. The same query seconds later returns a different
search result.

Is scoring broken in Elasticsearch/Lucene?

Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Does the output of explain help?

http://www.elasticsearch.org/guide/reference/api/search/explain.html

On Thu, Feb 28, 2013 at 7:15 PM, Bruno Miranda bru.miranda@gmail.comwrote:

I am seeing the same exact index and the same exact query return different
results, could someone please help me understand?

curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John",
"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Nancy"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Louis"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Leighton Mark", "Louis Mark", "Kimberly Leighton", "Barbara
Leighton", "John Leighton", "Nancy Mark", "Lawrence Mark", "Leighton Sweet"]

I run the same excact code again and the results are different
*
*
*
*
curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John",
"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Nancy"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Louis"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Nancy Mark", "Leighton Mark", "Lawrence Mark", "Louis Mark",
"Leighton Sweet", "Kimberly Leighton", "Barbara Leighton", "John Leighton"]

The first time the top results was "Leighton Mark" as it should be because
it matches both terms. The same query seconds later returns a different
search result.

Is scoring broken in Elasticsearch/Lucene?

Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Nick Zadrozny

Cofounder, One More Cloud

websolr.com https://websolr.com/home • bonsai.io http://bonsai.io/home

Hassle-free hosted full-text search,
powered by Apache Solr and ElasticSearch.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

As Ivan pointed out, the 8 documents you're using aren't enough to provide
a random distribution of terms across shards. Since IDF is computed on a
per shard basis, your results will vary depending on the way documents have
been hashed across the shards.

If you want to run small tests like this then just use one shard. The other
option, as Ivan also pointed out, is to change the query mode (search type)
so that a global IDF is calculated.

-Eric

On Thursday, February 28, 2013 10:15:13 PM UTC-5, Bruno Miranda wrote:

I am seeing the same exact index and the same exact query return different
results, could someone please help me understand?

curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John",
"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Nancy"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Louis"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Leighton Mark", "Louis Mark", "Kimberly Leighton", "Barbara
Leighton", "John Leighton", "Nancy Mark", "Lawrence Mark", "Leighton Sweet"]

I run the same excact code again and the results are different
*
*
*
*
curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John",
"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Nancy"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Louis"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Nancy Mark", "Leighton Mark", "Lawrence Mark", "Louis Mark",
"Leighton Sweet", "Kimberly Leighton", "Barbara Leighton", "John Leighton"]

The first time the top results was "Leighton Mark" as it should be because
it matches both terms. The same query seconds later returns a different
search result.

Is scoring broken in Elasticsearch/Lucene?

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If I lower the shard count to 2, it's much more accurate, almost 100% of
the time. With 5 shards it falls down to about 75%.

The reason I even started looking at such small dataset is because I was
seeing inconsistent results when using match. Query_string seems more
accurate as far as the scoring is concerned.

using query_string I get the desired result order:

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "mark leighton"
}
}
}'

["Leighton Mark", "Nancy Mark", "Lawrence Mark", "Louis Mark", "Kimberly
Leighton", "Barbara Leighton", "John Leighton", "Leighton Sweet"]

Using match:

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"multi_match": {
"query": "mark leighton",
"fields": [
"last",
"first"
]
}
}
}'

["Kimberly Leighton", "Barbara Leighton", "John Leighton", "Nancy Mark", "Lawrence
Mark", "Louis Mark", "Leighton Mark", "Leighton Sweet"]

I get the totally wrong order.

I suppose my first issue was using match when I should be using query
string, and the second is the dataset is too small for anything over 2
shards.

Any ideas?


On Thursday, February 28, 2013 7:15:13 PM UTC-8, Bruno Miranda wrote:

I am seeing the same exact index and the same exact query return different
results, could someone please help me understand?

curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John",
"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Nancy"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Louis"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Leighton Mark", "Louis Mark", "Kimberly Leighton", "Barbara
Leighton", "John Leighton", "Nancy Mark", "Lawrence Mark", "Leighton Sweet"]

I run the same excact code again and the results are different
*
*
*
*
curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John",
"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Nancy"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"Louis"
,"last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Nancy Mark", "Leighton Mark", "Lawrence Mark", "Louis Mark",
"Leighton Sweet", "Kimberly Leighton", "Barbara Leighton", "John Leighton"]

The first time the top results was "Leighton Mark" as it should be because
it matches both terms. The same query seconds later returns a different
search result.

Is scoring broken in Elasticsearch/Lucene?

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello,

You should try to use the search type dfs_query_then_fetch. It should make
the scoring much better on a small dataset.

More details :
http://www.elasticsearch.org/guide/reference/api/search/search-type.html

Regards

Benjamin

On Fri, Mar 1, 2013 at 6:29 AM, Bruno Miranda bru.miranda@gmail.com wrote:

If I lower the shard count to 2, it's much more accurate, almost 100% of
the time. With 5 shards it falls down to about 75%.

The reason I even started looking at such small dataset is because I was
seeing inconsistent results when using match. Query_string seems more
accurate as far as the scoring is concerned.

using query_string I get the desired result order:

curl -X GET 'http://localhost:9200/search/_search?pretty' -d
'{
"query": {
"query_string": {
"query": "mark leighton"
}
}
}'

["Leighton Mark", "Nancy Mark", "Lawrence Mark", "Louis Mark", "Kimberly
Leighton", "Barbara Leighton", "John Leighton", "Leighton Sweet"]

Using match:

curl -X GET 'http://localhost:9200/search/_search?pretty' -d
'{
"query": {
"multi_match": {
"query": "mark leighton",
"fields": [
"last",
"first"
]
}
}
}'

["Kimberly Leighton", "Barbara Leighton", "John Leighton", "Nancy Mark", "Lawrence
Mark", "Louis Mark", "Leighton Mark", "Leighton Sweet"]

I get the totally wrong order.

I suppose my first issue was using match when I should be using query
string, and the second is the dataset is too small for anything over 2
shards.

Any ideas?


On Thursday, February 28, 2013 7:15:13 PM UTC-8, Bruno Miranda wrote:

I am seeing the same exact index and the same exact query return
different results, could someone please help me understand?

curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Kimberly","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Barbara","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"John","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Nancy","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Lawrence","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Louis","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/
_refreshhttp://localhost:9200/search/_refresh
"

curl -X GET 'http://localhost:9200/search/**_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Leighton Mark", "Louis Mark", "Kimberly Leighton", "Barbara
Leighton", "John Leighton", "Nancy Mark", "Lawrence Mark", "Leighton Sweet"]

I run the same excact code again and the results are different
*
*
*
*
curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Kimberly","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Barbara","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"John","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Nancy","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Lawrence","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Louis","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/
_refreshhttp://localhost:9200/search/_refresh
"

curl -X GET 'http://localhost:9200/search/**_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Nancy Mark", "Leighton Mark", "Lawrence Mark", "Louis
Mark", "Leighton Sweet", "Kimberly Leighton", "Barbara Leighton", "John
Leighton"]

The first time the top results was "Leighton Mark" as it should be
because it matches both terms. The same query seconds later returns a
different search result.

Is scoring broken in Elasticsearch/Lucene?

Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

On Friday, March 1, 2013 6:29:02 AM UTC+1, Bruno Miranda wrote:

If I lower the shard count to 2, it's much more accurate, almost 100% of
the time. With 5 shards it falls down to about 75%.

The reason I even started looking at such small dataset is because I was
seeing inconsistent results when using match. Query_string seems more
accurate as far as the scoring is concerned.

using query_string I get the desired result order:

if you just use query_sting without a field you search on _all and you will
get the right matches since both clauses ("mark" & "leighton") of the
boolean query match

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "mark leighton"
}
}
}'

["Leighton Mark", "Nancy Mark", "Lawrence Mark", "Louis Mark", "Kimberly
Leighton", "Barbara Leighton", "John Leighton", "Leighton Sweet"]

Using match:

if you do multi_match it will automatically build a dismax query which
looks like this:

dismax(
boolean(last:mark, last:leighton),
boolean(first:mark, first:leighton),
)

which will use the score the max scoring boolean in the dismax + some
tiebreaker.
try to use use_dis_max = false in the multi_match query to make the top
level a boolean query this should bring up better results

simon

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"multi_match": {
"query": "mark leighton",
"fields": [
"last",
"first"
]
}
}
}'

["Kimberly Leighton", "Barbara Leighton", "John Leighton", "Nancy Mark", "Lawrence
Mark", "Louis Mark", "Leighton Mark", "Leighton Sweet"]

I get the totally wrong order.

I suppose my first issue was using match when I should be using query
string, and the second is the dataset is too small for anything over 2
shards.

Any ideas?


On Thursday, February 28, 2013 7:15:13 PM UTC-8, Bruno Miranda wrote:

I am seeing the same exact index and the same exact query return
different results, could someone please help me understand?

curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John"
,"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Nancy","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Louis","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Leighton Mark", "Louis Mark", "Kimberly Leighton", "Barbara
Leighton", "John Leighton", "Nancy Mark", "Lawrence Mark", "Leighton Sweet"]

I run the same excact code again and the results are different
*
*
*
*
curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":"John"
,"last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Nancy","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Louis","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Nancy Mark", "Leighton Mark", "Lawrence Mark", "Louis
Mark", "Leighton Sweet", "Kimberly Leighton", "Barbara Leighton", "John
Leighton"]

The first time the top results was "Leighton Mark" as it should be
because it matches both terms. The same query seconds later returns a
different search result.

Is scoring broken in Elasticsearch/Lucene?

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you for the answers.

I found that even at 5 shards, the query_and_fetch was very accurate.

What made by far the biggest difference in accuracy of sorting was the
user_dis_max: false.

{
"query": {
"multi_match": {
"query": "Leighton Mark",
"use_dis_max": true,
"fields": [
"first",
"last"
]
}
},
"size": 10
}

On Friday, March 1, 2013 1:18:24 AM UTC-8, simonw wrote:

Hey,

On Friday, March 1, 2013 6:29:02 AM UTC+1, Bruno Miranda wrote:

If I lower the shard count to 2, it's much more accurate, almost 100% of
the time. With 5 shards it falls down to about 75%.

The reason I even started looking at such small dataset is because I was
seeing inconsistent results when using match. Query_string seems more
accurate as far as the scoring is concerned.

using query_string I get the desired result order:

if you just use query_sting without a field you search on _all and you
will get the right matches since both clauses ("mark" & "leighton") of the
boolean query match

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "mark leighton"
}
}
}'

["Leighton Mark", "Nancy Mark", "Lawrence Mark", "Louis Mark", "Kimberly
Leighton", "Barbara Leighton", "John Leighton", "Leighton Sweet"]

Using match:

if you do multi_match it will automatically build a dismax query which
looks like this:

dismax(
boolean(last:mark, last:leighton),
boolean(first:mark, first:leighton),
)

which will use the score the max scoring boolean in the dismax + some
tiebreaker.
try to use use_dis_max = false in the multi_match query to make the top
level a boolean query this should bring up better results

simon

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"multi_match": {
"query": "mark leighton",
"fields": [
"last",
"first"
]
}
}
}'

["Kimberly Leighton", "Barbara Leighton", "John Leighton", "Nancy Mark", "Lawrence
Mark", "Louis Mark", "Leighton Mark", "Leighton Sweet"]

I get the totally wrong order.

I suppose my first issue was using match when I should be using query
string, and the second is the dataset is too small for anything over 2
shards.

Any ideas?


On Thursday, February 28, 2013 7:15:13 PM UTC-8, Bruno Miranda wrote:

I am seeing the same exact index and the same exact query return
different results, could someone please help me understand?

curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"John","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Nancy","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Louis","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Leighton Mark", "Louis Mark", "Kimberly Leighton",
"Barbara Leighton", "John Leighton", "Nancy Mark", "Lawrence Mark",
"Leighton Sweet"]

I run the same excact code again and the results are different
*
*
*
*
curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Kimberly","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Barbara","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"John","last":"Leighton"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Nancy","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Lawrence","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Louis","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Mark"}'
curl -X POST "http://localhost:9200/search/document/" -d '{"first":
"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/_refresh"

curl -X GET 'http://localhost:9200/search/_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Nancy Mark", "Leighton Mark", "Lawrence Mark", "Louis
Mark", "Leighton Sweet", "Kimberly Leighton", "Barbara Leighton", "John
Leighton"]

The first time the top results was "Leighton Mark" as it should be
because it matches both terms. The same query seconds later returns a
different search result.

Is scoring broken in Elasticsearch/Lucene?

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I can almost assure you that eventually you will not want query_and_fetch
once you start to have real data.

I was so focused on scoring that I missed the facet you were using the _all
field. As you discovered, Simon had the best suggestion so far.

--
Ivan

On Fri, Mar 1, 2013 at 10:09 AM, Bruno Miranda bru.miranda@gmail.comwrote:

Thank you for the answers.

I found that even at 5 shards, the query_and_fetch was very accurate.

What made by far the biggest difference in accuracy of sorting was the
user_dis_max: false.

{
"query": {
"multi_match": {
"query": "Leighton Mark",
"use_dis_max": true,
"fields": [
"first",
"last"
]
}
},
"size": 10
}

On Friday, March 1, 2013 1:18:24 AM UTC-8, simonw wrote:

Hey,

On Friday, March 1, 2013 6:29:02 AM UTC+1, Bruno Miranda wrote:

If I lower the shard count to 2, it's much more accurate, almost 100% of
the time. With 5 shards it falls down to about 75%.

The reason I even started looking at such small dataset is because I was
seeing inconsistent results when using match. Query_string seems more
accurate as far as the scoring is concerned.

using query_string I get the desired result order:

if you just use query_sting without a field you search on _all and you
will get the right matches since both clauses ("mark" & "leighton") of the
boolean query match

curl -X GET 'http://localhost:9200/search/**_search?prettyhttp://localhost:9200/search/_search?pretty
' -d '{
"query": {
"query_string": {
"query": "mark leighton"
}
}
}'

["Leighton Mark", "Nancy Mark", "Lawrence Mark", "Louis Mark", "Kimberly
Leighton", "Barbara Leighton", "John Leighton", "Leighton Sweet"]

Using match:

if you do multi_match it will automatically build a dismax query which
looks like this:

dismax(
boolean(last:mark, last:leighton),
boolean(first:mark, first:leighton),
)

which will use the score the max scoring boolean in the dismax + some
tiebreaker.
try to use use_dis_max = false in the multi_match query to make the
top level a boolean query this should bring up better results

simon

curl -X GET 'http://localhost:9200/search/**_search?prettyhttp://localhost:9200/search/_search?pretty
' -d '{
"query": {
"multi_match": {
"query": "mark leighton",
"fields": [
"last",
"first"
]
}
}
}'

["Kimberly Leighton", "Barbara Leighton", "John Leighton", "Nancy Mark", "Lawrence
Mark", "Louis Mark", "Leighton Mark", "Leighton Sweet"]

I get the totally wrong order.

I suppose my first issue was using match when I should be using query
string, and the second is the dataset is too small for anything over 2
shards.

Any ideas?

------------------------------**--

On Thursday, February 28, 2013 7:15:13 PM UTC-8, Bruno Miranda wrote:

I am seeing the same exact index and the same exact query return
different results, could someone please help me understand?

curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Kimberly","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Barbara","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"John","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Nancy","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Lawrence","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Louis","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/
_refreshhttp://localhost:9200/search/_refresh
"

curl -X GET 'http://localhost:9200/search/**_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Leighton Mark", "Louis Mark", "Kimberly Leighton",
"Barbara Leighton", "John Leighton", "Nancy Mark", "Lawrence Mark",
"Leighton Sweet"]

I run the same excact code again and the results are different
*
*
*
*
curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Kimberly","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Barbara","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"John","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Nancy","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Lawrence","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Louis","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/
_refreshhttp://localhost:9200/search/_refresh
"

curl -X GET 'http://localhost:9200/search/**_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Nancy Mark", "Leighton Mark", "Lawrence Mark", "Louis
Mark", "Leighton Sweet", "Kimberly Leighton", "Barbara Leighton", "John
Leighton"]

The first time the top results was "Leighton Mark" as it should be
because it matches both terms. The same query seconds later returns a
different search result.

Is scoring broken in Elasticsearch/Lucene?

Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Makes me wonder why use_dis_max=false isn’t the default for multi_match.
Seems like most use cases would want that.

If I search for "blue sky" in fields color and location, I would always
expect the doc that has both blue in color and sky in location to come
first.

On Friday, March 1, 2013 10:45:21 AM UTC-8, Ivan Brusic wrote:

I can almost assure you that eventually you will not want query_and_fetch
once you start to have real data.

I was so focused on scoring that I missed the facet you were using the
_all field. As you discovered, Simon had the best suggestion so far.

--
Ivan

On Fri, Mar 1, 2013 at 10:09 AM, Bruno Miranda <bru.m...@gmail.com<javascript:>

wrote:

Thank you for the answers.

I found that even at 5 shards, the query_and_fetch was very accurate.

What made by far the biggest difference in accuracy of sorting was the
user_dis_max: false.

{
"query": {
"multi_match": {
"query": "Leighton Mark",
"use_dis_max": true,
"fields": [
"first",
"last"
]
}
},
"size": 10
}

On Friday, March 1, 2013 1:18:24 AM UTC-8, simonw wrote:

Hey,

On Friday, March 1, 2013 6:29:02 AM UTC+1, Bruno Miranda wrote:

If I lower the shard count to 2, it's much more accurate, almost 100%
of the time. With 5 shards it falls down to about 75%.

The reason I even started looking at such small dataset is because I
was seeing inconsistent results when using match. Query_string seems more
accurate as far as the scoring is concerned.

using query_string I get the desired result order:

if you just use query_sting without a field you search on _all and you
will get the right matches since both clauses ("mark" & "leighton") of the
boolean query match

curl -X GET 'http://localhost:9200/search/**_search?prettyhttp://localhost:9200/search/_search?pretty
' -d '{
"query": {
"query_string": {
"query": "mark leighton"
}
}
}'

["Leighton Mark", "Nancy Mark", "Lawrence Mark", "Louis Mark", "Kimberly
Leighton", "Barbara Leighton", "John Leighton", "Leighton Sweet"]

Using match:

if you do multi_match it will automatically build a dismax query which
looks like this:

dismax(
boolean(last:mark, last:leighton),
boolean(first:mark, first:leighton),
)

which will use the score the max scoring boolean in the dismax + some
tiebreaker.
try to use use_dis_max = false in the multi_match query to make the
top level a boolean query this should bring up better results

simon

curl -X GET 'http://localhost:9200/search/**_search?prettyhttp://localhost:9200/search/_search?pretty
' -d '{
"query": {
"multi_match": {
"query": "mark leighton",
"fields": [
"last",
"first"
]
}
}
}'

["Kimberly Leighton", "Barbara Leighton", "John Leighton", "Nancy Mark"
, "Lawrence Mark", "Louis Mark", "Leighton Mark", "Leighton Sweet"]

I get the totally wrong order.

I suppose my first issue was using match when I should be using query
string, and the second is the dataset is too small for anything over 2
shards.

Any ideas?

------------------------------**--

On Thursday, February 28, 2013 7:15:13 PM UTC-8, Bruno Miranda wrote:

I am seeing the same exact index and the same exact query return
different results, could someone please help me understand?

curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Kimberly","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Barbara","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"John","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Nancy","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Lawrence","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Louis","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/
_refreshhttp://localhost:9200/search/_refresh
"

curl -X GET 'http://localhost:9200/search/**_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Leighton Mark", "Louis Mark", "Kimberly Leighton",
"Barbara Leighton", "John Leighton", "Nancy Mark", "Lawrence Mark",
"Leighton Sweet"]

I run the same excact code again and the results are different
*
*
*
*
curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Kimberly","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Barbara","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"John","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Nancy","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Lawrence","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Louis","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/
_refreshhttp://localhost:9200/search/_refresh
"

curl -X GET 'http://localhost:9200/search/**_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Nancy Mark", "Leighton Mark", "Lawrence Mark", "Louis
Mark", "Leighton Sweet", "Kimberly Leighton", "Barbara Leighton", "John
Leighton"]

The first time the top results was "Leighton Mark" as it should be
because it matches both terms. The same query seconds later returns a
different search result.

Is scoring broken in Elasticsearch/Lucene?

Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Friday, March 1, 2013 7:49:12 PM UTC+1, Bruno Miranda wrote:

Makes me wonder why use_dis_max=false isn’t the default for multi_match.
Seems like most use cases would want that.

If I search for "blue sky" in fields color and location, I would always
expect the doc that has both blue in color and sky in location to come
first.

Well, this is a tricky question. If you use dismax then you are discounting
matches in fields that are "partial" or where the terms are not
significant. Here is an example"

2 fields: artist, song
doc 1: artist: foo fighters song: generator
doc 2: artist: the fighters song: foo

in this case dismax does a very good job. While in your case it doesn't.
What you want really as an additional feature is to parse the query into
cross field dismax queries where your example becomes:
boolean(
dismax( color:blue, loc:blue),
dismax( color:sky loc sky)
)

which would make both possible which is kind of tricky since it requires
that both fields produce roughly the same tokens. I think using dismax is a
ok default but boolean might be more intuitive for new users.

simon

On Friday, March 1, 2013 10:45:21 AM UTC-8, Ivan Brusic wrote:

I can almost assure you that eventually you will not want query_and_fetch
once you start to have real data.

I was so focused on scoring that I missed the facet you were using the
_all field. As you discovered, Simon had the best suggestion so far.

--
Ivan

On Fri, Mar 1, 2013 at 10:09 AM, Bruno Miranda bru.m...@gmail.comwrote:

Thank you for the answers.

I found that even at 5 shards, the query_and_fetch was very accurate.

What made by far the biggest difference in accuracy of sorting was the
user_dis_max: false.

{
"query": {
"multi_match": {
"query": "Leighton Mark",
"use_dis_max": true,
"fields": [
"first",
"last"
]
}
},
"size": 10
}

On Friday, March 1, 2013 1:18:24 AM UTC-8, simonw wrote:

Hey,

On Friday, March 1, 2013 6:29:02 AM UTC+1, Bruno Miranda wrote:

If I lower the shard count to 2, it's much more accurate, almost 100%
of the time. With 5 shards it falls down to about 75%.

The reason I even started looking at such small dataset is because I
was seeing inconsistent results when using match. Query_string seems more
accurate as far as the scoring is concerned.

using query_string I get the desired result order:

if you just use query_sting without a field you search on _all and you
will get the right matches since both clauses ("mark" & "leighton") of the
boolean query match

curl -X GET 'http://localhost:9200/search/**_search?prettyhttp://localhost:9200/search/_search?pretty
' -d '{
"query": {
"query_string": {
"query": "mark leighton"
}
}
}'

["Leighton Mark", "Nancy Mark", "Lawrence Mark", "Louis Mark", "Kimberly
Leighton", "Barbara Leighton", "John Leighton", "Leighton Sweet"]

Using match:

if you do multi_match it will automatically build a dismax query which
looks like this:

dismax(
boolean(last:mark, last:leighton),
boolean(first:mark, first:leighton),
)

which will use the score the max scoring boolean in the dismax + some
tiebreaker.
try to use use_dis_max = false in the multi_match query to make the
top level a boolean query this should bring up better results

simon

curl -X GET 'http://localhost:9200/search/**_search?prettyhttp://localhost:9200/search/_search?pretty
' -d '{
"query": {
"multi_match": {
"query": "mark leighton",
"fields": [
"last",
"first"
]
}
}
}'

["Kimberly Leighton", "Barbara Leighton", "John Leighton", "Nancy
Mark", "Lawrence Mark", "Louis Mark", "Leighton Mark", "Leighton
Sweet"]

I get the totally wrong order.

I suppose my first issue was using match when I should be using query
string, and the second is the dataset is too small for anything over 2
shards.

Any ideas?

------------------------------**--

On Thursday, February 28, 2013 7:15:13 PM UTC-8, Bruno Miranda wrote:

I am seeing the same exact index and the same exact query return
different results, could someone please help me understand?

curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Kimberly","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Barbara","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"John","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Nancy","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Lawrence","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Louis","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/
_refreshhttp://localhost:9200/search/_refresh
"

curl -X GET 'http://localhost:9200/search/**_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Leighton Mark", "Louis Mark", "Kimberly Leighton",
"Barbara Leighton", "John Leighton", "Nancy Mark", "Lawrence Mark",
"Leighton Sweet"]

I run the same excact code again and the results are different
*
*
*
*
curl -X DELETE http://localhost:9200/search
curl -X POST http://localhost:9200/search -d '{
"mappings": {
"document": {
"properties": {
"first": {
"type": "string"
},
"last": {
"type": "string",
"boost": 2.0
}
}
}
}
}'

curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Kimberly","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Barbara","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"John","last":"**Leighton"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Nancy","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Lawrence","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Louis","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"**Mark"}'
curl -X POST "http://localhost:9200/search/**document/http://localhost:9200/search/document/"
-d '{"first":"Leighton","last":"Sweet"}'
curl -X POST "http://localhost:9200/search/
_refreshhttp://localhost:9200/search/_refresh
"

curl -X GET 'http://localhost:9200/search/**_search?pretty' -d '{
"query": {
"query_string": {
"query": "Mark Leighton"
}
}
}'

RESULTS: ["Nancy Mark", "Leighton Mark", "Lawrence Mark", "Louis
Mark", "Leighton Sweet", "Kimberly Leighton", "Barbara Leighton", "John
Leighton"]

The first time the top results was "Leighton Mark" as it should be
because it matches both terms. The same query seconds later returns a
different search result.

Is scoring broken in Elasticsearch/Lucene?

Thank you.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.