Different results on search with several nodes


(Byakuya) #1

Hello!
Here is our problem. We have ElasticSearch cluster with two nodes (20
shards, 2 replicas). There is an index for CouchDb datasource (approx.
2 mil. documents) via river. When we make the same search request with
limitation of first 100 results some times, we receive two different
sets of results - one after another in turn. Some result elements are
skipped but another are added. They are sorted correctly. Total
quantity of results is the same, but some elements differ. When we
disable one node and only one works, then set of search results we get
is stable.

Query example (query is built depending on data, which is received
from html form):

curl -XGET 'http://192.168.0.248:9200/tenderinfo_index/_search' -d '{
"sort": {
"publishDate.value": "desc"
},
"from": 0,
"fields": [
"_id",
"orderName.value",
... and other fields we need to retrieve
],
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"query_string": {
"query": "разработка
сайта",
"default_operator":
"AND",
"default_field":
"orderName.value",
"analyzer": "russian"
}
},
{
"nested": {
"path": "lots",
"score_mode": "avg",
"query": {
"query_string": {
"query":
"разработка сайта",

"default_operator": "AND",
"default_field":
"lots.subject.value",
"analyzer":
"russian"
}
}
}
}
]
}
}
]
}
}
}
},
"size": 100
}'

The goal of our task is to get hash on search results to detect new
data and perform some actions with it. But with interleaved sets of
results hash is different at the same query.
How this problem can be solved? We appreciate any suggestions.


(Shay Banon) #2

It might be that some documents have the same sorting value, and then, when
you execute one search and it hits one set of shards, and another which
hits another set of shards (copies of the data), you will get different
results (but correct sorting).

You have the optino to specify a "preference" when searching:
http://www.elasticsearch.org/guide/reference/api/search/preference.html,
specifically, check the "custom string value"). This can ensure two
searches will use the same shards.

On Wed, Mar 28, 2012 at 8:11 AM, Byakuya mukhin.vladimir@googlemail.comwrote:

Hello!
Here is our problem. We have ElasticSearch cluster with two nodes (20
shards, 2 replicas). There is an index for CouchDb datasource (approx.
2 mil. documents) via river. When we make the same search request with
limitation of first 100 results some times, we receive two different
sets of results - one after another in turn. Some result elements are
skipped but another are added. They are sorted correctly. Total
quantity of results is the same, but some elements differ. When we
disable one node and only one works, then set of search results we get
is stable.

Query example (query is built depending on data, which is received
from html form):

curl -XGET 'http://192.168.0.248:9200/tenderinfo_index/_search' -d '{
"sort": {
"publishDate.value": "desc"
},
"from": 0,
"fields": [
"_id",
"orderName.value",
... and other fields we need to retrieve
],
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"query_string": {
"query": "разработка
сайта",
"default_operator":
"AND",
"default_field":
"orderName.value",
"analyzer": "russian"
}
},
{
"nested": {
"path": "lots",
"score_mode": "avg",
"query": {
"query_string": {
"query":
"разработка сайта",

"default_operator": "AND",
"default_field":
"lots.subject.value",
"analyzer":
"russian"
}
}
}
}
]
}
}
]
}
}
}
},
"size": 100
}'

The goal of our task is to get hash on search results to detect new
data and perform some actions with it. But with interleaved sets of
results hash is different at the same query.
How this problem can be solved? We appreciate any suggestions.


(Byakuya) #3

Thank you very much!

Worked as a charm.

On 28 мар, 14:58, Shay Banon kim...@gmail.com wrote:

It might be that some documents have the same sorting value, and then, when
you execute one search and it hits one set of shards, and another which
hits another set of shards (copies of the data), you will get different
results (but correct sorting).

You have the optino to specify a "preference" when searching:http://www.elasticsearch.org/guide/reference/api/search/preference.html,
specifically, check the "custom string value"). This can ensure two
searches will use the same shards.


(Shaun) #4

thanks for this answer as I've just been sitting here puzzling over
the same thing - click refresh and the hits I get back in first
size=25 alternate between two diff 25 sets..
based on your advice I've tried using sessionid in preference but have
no idea whether I've done it right, I stuck it in the query DSL:

{"query":{"filtered":{"query":{"query_string":
{"query":"ProductionSchool:british","default_operator":"OR"}},"filter":
{"and":[{"exists":{"field":"ObjectNumber"}},{"term":
{"Category":"painting"}}]}},"preference":"b8164e5587009099581f89faa5bde211"},"from":
0,"size":"25","facets":{"Dept":{"terms":{"field":"Dept"}},"Category":
{"terms":{"field":"Category","size":"50"}},"Name":{"terms":
{"field":"Name","size":"50"}},"Material":{"terms":
{"field":"Material","size":"50"}},"Technique":{"terms":
{"field":"Technique","size":"50"}},"ProductionPlaceName":{"terms":
{"field":"ProductionPlaceName","size":"50"}},"RRFlag":{"terms":
{"field":"RRFlag"}},"Maker":{"terms":
{"field":"Maker.asterm","size":"50"}}}}

I have two ES's running and the only config I have done is to name the
group and set the master - everything else ES is looking after

thanks for any help
Shaun

On Mar 28, 11:58 am, Shay Banon kim...@gmail.com wrote:

It might be that some documents have the same sorting value, and then, when
you execute one search and it hits one set of shards, and another which
hits another set of shards (copies of the data), you will get different
results (but correct sorting).

You have the optino to specify a "preference" when searching:http://www.elasticsearch.org/guide/reference/api/search/preference.html,
specifically, check the "custom string value"). This can ensure two
searches will use the same shards.

On Wed, Mar 28, 2012 at 8:11 AM, Byakuya mukhin.vladi...@googlemail.comwrote:

Hello!
Here is our problem. We have ElasticSearch cluster with two nodes (20
shards, 2 replicas). There is an index for CouchDb datasource (approx.
2 mil. documents) via river. When we make the same search request with
limitation of first 100 results some times, we receive two different
sets of results - one after another in turn. Some result elements are
skipped but another are added. They are sorted correctly. Total
quantity of results is the same, but some elements differ. When we
disable one node and only one works, then set of search results we get
is stable.

Query example (query is built depending on data, which is received
from html form):

curl -XGET 'http://192.168.0.248:9200/tenderinfo_index/_search'-d '{
"sort": {
"publishDate.value": "desc"
},
"from": 0,
"fields": [
"_id",
"orderName.value",
... and other fields we need to retrieve
],
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"query_string": {
"query": "разработка
сайта",
"default_operator":
"AND",
"default_field":
"orderName.value",
"analyzer": "russian"
}
},
{
"nested": {
"path": "lots",
"score_mode": "avg",
"query": {
"query_string": {
"query":
"разработка сайта",

"default_operator": "AND",
"default_field":
"lots.subject.value",
"analyzer":
"russian"
}
}
}
}
]
}
}
]
}
}
}
},
"size": 100
}'

The goal of our task is to get hash on search results to detect new
data and perform some actions with it. But with interleaved sets of
results hash is different at the same query.
How this problem can be solved? We appreciate any suggestions.


(Shay Banon) #5

The preference should be passed as part of the URI request, i.e.:
/_search?preference=xxx, thats because it is required by the
node receiving the search request to properly distribute it.

On Fri, Apr 6, 2012 at 12:52 PM, Shaun cybergate9@googlemail.com wrote:

thanks for this answer as I've just been sitting here puzzling over
the same thing - click refresh and the hits I get back in first
size=25 alternate between two diff 25 sets..
based on your advice I've tried using sessionid in preference but have
no idea whether I've done it right, I stuck it in the query DSL:

{"query":{"filtered":{"query":{"query_string":
{"query":"ProductionSchool:british","default_operator":"OR"}},"filter":
{"and":[{"exists":{"field":"ObjectNumber"}},{"term":

{"Category":"painting"}}]}},"preference":"b8164e5587009099581f89faa5bde211"},"from":
0,"size":"25","facets":{"Dept":{"terms":{"field":"Dept"}},"Category":
{"terms":{"field":"Category","size":"50"}},"Name":{"terms":
{"field":"Name","size":"50"}},"Material":{"terms":
{"field":"Material","size":"50"}},"Technique":{"terms":
{"field":"Technique","size":"50"}},"ProductionPlaceName":{"terms":
{"field":"ProductionPlaceName","size":"50"}},"RRFlag":{"terms":
{"field":"RRFlag"}},"Maker":{"terms":
{"field":"Maker.asterm","size":"50"}}}}

I have two ES's running and the only config I have done is to name the
group and set the master - everything else ES is looking after

thanks for any help
Shaun

On Mar 28, 11:58 am, Shay Banon kim...@gmail.com wrote:

It might be that some documents have the same sorting value, and then,
when
you execute one search and it hits one set of shards, and another which
hits another set of shards (copies of the data), you will get different
results (but correct sorting).

You have the optino to specify a "preference" when searching:
http://www.elasticsearch.org/guide/reference/api/search/preference.html,
specifically, check the "custom string value"). This can ensure two
searches will use the same shards.

On Wed, Mar 28, 2012 at 8:11 AM, Byakuya <mukhin.vladi...@googlemail.com
wrote:

Hello!
Here is our problem. We have ElasticSearch cluster with two nodes (20
shards, 2 replicas). There is an index for CouchDb datasource (approx.
2 mil. documents) via river. When we make the same search request with
limitation of first 100 results some times, we receive two different
sets of results - one after another in turn. Some result elements are
skipped but another are added. They are sorted correctly. Total
quantity of results is the same, but some elements differ. When we
disable one node and only one works, then set of search results we get
is stable.

Query example (query is built depending on data, which is received
from html form):

curl -XGET 'http://192.168.0.248:9200/tenderinfo_index/_search'-d '{
"sort": {
"publishDate.value": "desc"
},
"from": 0,
"fields": [
"_id",
"orderName.value",
... and other fields we need to retrieve
],
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"query_string": {
"query": "разработка
сайта",
"default_operator":
"AND",
"default_field":
"orderName.value",
"analyzer": "russian"
}
},
{
"nested": {
"path": "lots",
"score_mode": "avg",
"query": {
"query_string": {
"query":
"разработка сайта",

"default_operator": "AND",
"default_field":
"lots.subject.value",
"analyzer":
"russian"
}
}
}
}
]
}
}
]
}
}
}
},
"size": 100
}'

The goal of our task is to get hash on search results to detect new
data and perform some actions with it. But with interleaved sets of
results hash is different at the same query.
How this problem can be solved? We appreciate any suggestions.


(Shaun) #6

oh doh me!
didn't try the simplest way to do it did I? :slight_smile:
thanks so much (and btw the way for elasticsearch too)
it's now driving our new collections explorer at:
http://www.fitzmuseum.cam.ac.uk/explorer/
Shaun Osborne

On Apr 8, 7:08 pm, Shay Banon kim...@gmail.com wrote:

The preference should be passed as part of the URI request, i.e.:
/_search?preference=xxx, thats because it is required by the
node receiving the search request to properly distribute it.

On Fri, Apr 6, 2012 at 12:52 PM, Shaun cyberga...@googlemail.com wrote:

thanks for this answer as I've just been sitting here puzzling over
the same thing - click refresh and the hits I get back in first
size=25 alternate between two diff 25 sets..
based on your advice I've tried using sessionid in preference but have
no idea whether I've done it right, I stuck it in the query DSL:

{"query":{"filtered":{"query":{"query_string":
{"query":"ProductionSchool:british","default_operator":"OR"}},"filter":
{"and":[{"exists":{"field":"ObjectNumber"}},{"term":

{"Category":"painting"}}]}},"preference":"b8164e5587009099581f89faa5bde211" },"from":
0,"size":"25","facets":{"Dept":{"terms":{"field":"Dept"}},"Category":
{"terms":{"field":"Category","size":"50"}},"Name":{"terms":
{"field":"Name","size":"50"}},"Material":{"terms":
{"field":"Material","size":"50"}},"Technique":{"terms":
{"field":"Technique","size":"50"}},"ProductionPlaceName":{"terms":
{"field":"ProductionPlaceName","size":"50"}},"RRFlag":{"terms":
{"field":"RRFlag"}},"Maker":{"terms":
{"field":"Maker.asterm","size":"50"}}}}

I have two ES's running and the only config I have done is to name the
group and set the master - everything else ES is looking after

thanks for any help
Shaun

On Mar 28, 11:58 am, Shay Banon kim...@gmail.com wrote:

It might be that some documents have the same sorting value, and then,
when
you execute one search and it hits one set of shards, and another which
hits another set of shards (copies of the data), you will get different
results (but correct sorting).

You have the optino to specify a "preference" when searching:
http://www.elasticsearch.org/guide/reference/api/search/preference.html,
specifically, check the "custom string value"). This can ensure two
searches will use the same shards.

On Wed, Mar 28, 2012 at 8:11 AM, Byakuya <mukhin.vladi...@googlemail.com
wrote:

Hello!
Here is our problem. We have ElasticSearch cluster with two nodes (20
shards, 2 replicas). There is an index for CouchDb datasource (approx.
2 mil. documents) via river. When we make the same search request with
limitation of first 100 results some times, we receive two different
sets of results - one after another in turn. Some result elements are
skipped but another are added. They are sorted correctly. Total
quantity of results is the same, but some elements differ. When we
disable one node and only one works, then set of search results we get
is stable.

Query example (query is built depending on data, which is received
from html form):

curl -XGET 'http://192.168.0.248:9200/tenderinfo_index/_search'-d'{
"sort": {
"publishDate.value": "desc"
},
"from": 0,
"fields": [
"_id",
"orderName.value",
... and other fields we need to retrieve
],
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"query_string": {
"query": "разработка
сайта",
"default_operator":
"AND",
"default_field":
"orderName.value",
"analyzer": "russian"
}
},
{
"nested": {
"path": "lots",
"score_mode": "avg",
"query": {
"query_string": {
"query":
"разработка сайта",

"default_operator": "AND",
"default_field":
"lots.subject.value",
"analyzer":
"russian"
}
}
}
}
]
}
}
]
}
}
}
},
"size": 100
}'

The goal of our task is to get hash on search results to detect new
data and perform some actions with it. But with interleaved sets of
results hash is different at the same query.
How this problem can be solved? We appreciate any suggestions.


(system) #7