Same document repeated in search results


(Pir Abdul Rasool Qureshi) #1

Hi I am getting same document (with same "_id") repeated more than once in
search results.

My query looks like

POST http://XXXX:9200/_search/
{
"query": {
"multi_match" : {
"fields":[
"en_text_keywords_1^12",
"en_text_keywords_2^8",
"en_text_keywords_3^6",
"en_text_keywords_4^4",
"en_text_keywords_5^2",
"en_text_title^12"
],
"query":"animals"
}
},
"size": 1000
}

Is it a bug? or something I am missing?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c1c3f24-eb33-4035-bc9a-dc7fa26b6a87%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Pir Abdul Rasool Qureshi) #2

I deleted the document and re-inserted it, it actually solved the issue
with that particular document. But now there is one more document that has
same problem. Ou results looks like containing :
...
{
"_index": "twitter",
"_type": "tweet",
"_id": "1739753",
"_score": 8.071245,
"fields": {
"en_text_title": [
"Young tiger portrait"
]
}
},
{
"_index": "twitter",
"_type": "tweet",
"_id": "1739753",
"_score": 8.071245,
"fields": {
"en_text_title": [
"Young tiger portrait"
]
}
},
...

Another important thing to mention here is that we have a multithreaded
application sending the documents to elasticsearch . Therefore there are
chances that we may send the same document more than once.

Thanks
Pir.

On Friday, February 21, 2014 11:57:35 AM UTC+1, Pir Abdul Rasool Qureshi
wrote:

Hi I am getting same document (with same "_id") repeated more than once in
search results.

My query looks like

POST http://XXXX:9200/_search/
{
"query": {
"multi_match" : {
"fields":[
"en_text_keywords_1^12",
"en_text_keywords_2^8",
"en_text_keywords_3^6",
"en_text_keywords_4^4",
"en_text_keywords_5^2",
"en_text_title^12"
],
"query":"animals"
}
},
"size": 1000
}

Is it a bug? or something I am missing?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0b154ae4-fd99-4f9a-830f-69e949e26b8b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Pir Abdul Rasool Qureshi) #3

We have 3 nodes in elastic search cluster, (with number_of_replicas = 2 and
number of shards = 8). I stopped both replicas and executed the same
query but the problem was still there.

On Friday, February 21, 2014 11:57:35 AM UTC+1, Pir Abdul Rasool Qureshi
wrote:

Hi I am getting same document (with same "_id") repeated more than once in
search results.

My query looks like

POST http://XXXX:9200/_search/
{
"query": {
"multi_match" : {
"fields":[
"en_text_keywords_1^12",
"en_text_keywords_2^8",
"en_text_keywords_3^6",
"en_text_keywords_4^4",
"en_text_keywords_5^2",
"en_text_title^12"
],
"query":"animals"
}
},
"size": 1000
}

Is it a bug? or something I am missing?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2c49db40-3ddd-4b88-8428-2d68ce72f821%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #4

May I ask which version of ES? And also are you using the REST API to index
the documents with an explicit ID?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7d86859a-b12b-4a37-83dd-1d62c29ec06f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Hannes Korte) #5

Hi,

do you specify a custom routing parameter when indexing the documents?
If so, you might have documents with the same ID in different shards:

"When indexing documents specifying a custom _routing, the uniqueness of
the _id is not guaranteed throughout all the shards that the index is
composed of. In fact, documents with the same _id might end up in
different shards if indexed with different _routing values."

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-routing-field.html

Best regards,
Hannes

On 21.02.2014 11:57, Pir Abdul Rasool Qureshi wrote:

Hi I am getting same document (with same "_id") repeated more than once in
search results.

My query looks like

POST http://XXXX:9200/_search/
{
"query": {
"multi_match" : {
"fields":[
"en_text_keywords_1^12",
"en_text_keywords_2^8",
"en_text_keywords_3^6",
"en_text_keywords_4^4",
"en_text_keywords_5^2",
"en_text_title^12"
],
"query":"animals"
}
},
"size": 1000
}

Is it a bug? or something I am missing?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5307AA88.2070809%40hkorte.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Pir Abdul Rasool Qureshi) #6

We are using Elastic Search version 1.0, elastic search official php-client
(http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/index.html)
version 1.0 and BulkApi.
We are not specifying any custom routing. Our request looks like this:

{"index":{"_index":"twitter","_type":"tweet","_id":"1739753"}}
{"field1":"value1" ... "field n":"value"}

I need help to understand,
why while using GET API, we find only one document, whereas, while
searching same document appears twice?
why does overwriting the same document fixes the issue with that document?
Is there any way to ensure the _id uniqueness throughout the index?

On Friday, February 21, 2014 8:35:36 PM UTC+1, Hannes Korte wrote:

Hi,

do you specify a custom routing parameter when indexing the documents?
If so, you might have documents with the same ID in different shards:

"When indexing documents specifying a custom _routing, the uniqueness of
the _id is not guaranteed throughout all the shards that the index is
composed of. In fact, documents with the same _id might end up in
different shards if indexed with different _routing values."

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-routing-field.html

Best regards,
Hannes

On 21.02.2014 11:57, Pir Abdul Rasool Qureshi wrote:

Hi I am getting same document (with same "_id") repeated more than once
in
search results.

My query looks like

POST http://XXXX:9200/_search/
{
"query": {
"multi_match" : {
"fields":[
"en_text_keywords_1^12",
"en_text_keywords_2^8",
"en_text_keywords_3^6",
"en_text_keywords_4^4",
"en_text_keywords_5^2",
"en_text_title^12"
],
"query":"animals"
}
},
"size": 1000
}

Is it a bug? or something I am missing?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a7095a2-aaa0-4912-a60f-ff16f7e85b3a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #7

They are unique.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 22 févr. 2014 à 14:53, Pir Abdul Rasool Qureshi pir@colourbox.com a écrit :

We are using Elastic Search version 1.0, elastic search official php-client (http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/index.html) version 1.0 and BulkApi.
We are not specifying any custom routing. Our request looks like this:

{"index":{"_index":"twitter","_type":"tweet","_id":"1739753"}}
{"field1":"value1" ... "field n":"value"}

I need help to understand,
why while using GET API, we find only one document, whereas, while searching same document appears twice?
why does overwriting the same document fixes the issue with that document?
Is there any way to ensure the _id uniqueness throughout the index?

On Friday, February 21, 2014 8:35:36 PM UTC+1, Hannes Korte wrote:
Hi,

do you specify a custom routing parameter when indexing the documents?
If so, you might have documents with the same ID in different shards:

"When indexing documents specifying a custom _routing, the uniqueness of
the _id is not guaranteed throughout all the shards that the index is
composed of. In fact, documents with the same _id might end up in
different shards if indexed with different _routing values."

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-routing-field.html

Best regards,
Hannes

On 21.02.2014 11:57, Pir Abdul Rasool Qureshi wrote:

Hi I am getting same document (with same "_id") repeated more than once in
search results.

My query looks like

POST http://XXXX:9200/_search/
{
"query": {
"multi_match" : {
"fields":[
"en_text_keywords_1^12",
"en_text_keywords_2^8",
"en_text_keywords_3^6",
"en_text_keywords_4^4",
"en_text_keywords_5^2",
"en_text_title^12"
],
"query":"animals"
}
},
"size": 1000
}

Is it a bug? or something I am missing?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a7095a2-aaa0-4912-a60f-ff16f7e85b3a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E4162F2F-B9AD-41CE-AE88-542EE2031D77%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8