Duplicate documents in query,


(Georgi Ivanov) #1

Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"gte": "2011-08-30T00:00:00Z",
"lte": "2011-08-31T23:59:00Z"
}
}
},
{
"term": {
"entity_id": {
"value": "298082"
}
}
}
]
}
}
,
"sort": [
{
"ts": {
"order": "asc"
}
}
],
"size": 90

}

Result (there are more, just showing duplicates):
{
"_index": "track_201108",
"_type": "position",
"_id": "298082_1314758608000_1302",
"_score": null,
"_source": {
"ts": 1314758608000,
"entity_id": 298082,
"loc": {
"type": "point",
"coordinates": [
103.694783333,
1.23463333333
]
}
},
"sort": [
1314758608000
]
},
{
"_index": "track_201108",
"_type": "position",
"_id": "298082_1314758608000_1302",
"_score": null,
"_source": {
"ts": 1314758608000,
"entity_id": 298082,
"loc": {
"type": "point",
"coordinates": [
103.694783333,
1.23463333333
]
}
},
"sort": [
1314758608000
]
}

But if i get the document :

curl -s es01.host.com:9200/track_201108/position/298082_1314758608000_1302
| json_pp
{
"found" : true,
"_version" : 1,
"_type" : "position",
"_index" : "track_201108",
"_source" : {
"hourly" : false,
"loc" : {
"type" : "point",
"coordinates" : [
103.694783333,
1.23463333333
]
},
"ts" : 1314758608000,
"entity_id" : 298082
},
"_id" : "298082_1314758608000_1302"
}

So i have only one document (and it was never updated as version is 1 ).

I don't understand what is going on here .

No special routing, no parent/child relations.

Any ideas ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/abf4a5a9-495f-4480-b326-0d9562c696b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #2

Which elasticsearch version have you?

--
David Pilato - Developer | Evangelist
elastic.co
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 29 avr. 2015 à 16:44, Georgi Ivanov georgi.r.ivanov@gmail.com a écrit :

Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"gte": "2011-08-30T00:00:00Z",
"lte": "2011-08-31T23:59:00Z"
}
}
},
{
"term": {
"entity_id": {
"value": "298082"
}
}
}
]
}
}
,
"sort": [
{
"ts": {
"order": "asc"
}
}
],
"size": 90

}

Result (there are more, just showing duplicates):
{
"_index": "track_201108",
"_type": "position",
"_id": "298082_1314758608000_1302",
"_score": null,
"_source": {
"ts": 1314758608000,
"entity_id": 298082,
"loc": {
"type": "point",
"coordinates": [
103.694783333,
1.23463333333
]
}
},
"sort": [
1314758608000
]
},
{
"_index": "track_201108",
"_type": "position",
"_id": "298082_1314758608000_1302",
"_score": null,
"_source": {
"ts": 1314758608000,
"entity_id": 298082,
"loc": {
"type": "point",
"coordinates": [
103.694783333,
1.23463333333
]
}
},
"sort": [
1314758608000
]
}

But if i get the document :

curl -s es01.host.com:9200/track_201108/position/298082_1314758608000_1302 | json_pp
{
"found" : true,
"_version" : 1,
"_type" : "position",
"_index" : "track_201108",
"_source" : {
"hourly" : false,
"loc" : {
"type" : "point",
"coordinates" : [
103.694783333,
1.23463333333
]
},
"ts" : 1314758608000,
"entity_id" : 298082
},
"_id" : "298082_1314758608000_1302"
}

So i have only one document (and it was never updated as version is 1 ).

I don't understand what is going on here .

No special routing, no parent/child relations.

Any ideas ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/abf4a5a9-495f-4480-b326-0d9562c696b1%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/abf4a5a9-495f-4480-b326-0d9562c696b1%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/78353643-37DD-43E2-9D74-19D04AE1B081%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(Georgi Ivanov) #3

1.5.2

On Wednesday, April 29, 2015 at 4:44:09 PM UTC+2, Georgi Ivanov wrote:

Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"gte": "2011-08-30T00:00:00Z",
"lte": "2011-08-31T23:59:00Z"
}
}
},
{
"term": {
"entity_id": {
"value": "298082"
}
}
}
]
}
}
,
"sort": [
{
"ts": {
"order": "asc"
}
}
],
"size": 90

}

Result (there are more, just showing duplicates):
{
"_index": "track_201108",
"_type": "position",
"_id": "298082_1314758608000_1302",
"_score": null,
"_source": {
"ts": 1314758608000,
"entity_id": 298082,
"loc": {
"type": "point",
"coordinates": [
103.694783333,
1.23463333333
]
}
},
"sort": [
1314758608000
]
},
{
"_index": "track_201108",
"_type": "position",
"_id": "298082_1314758608000_1302",
"_score": null,
"_source": {
"ts": 1314758608000,
"entity_id": 298082,
"loc": {
"type": "point",
"coordinates": [
103.694783333,
1.23463333333
]
}
},
"sort": [
1314758608000
]
}

But if i get the document :

curl -s es01.host.com:9200/track_201108/position/298082_1314758608000_1302
| json_pp
{
"found" : true,
"_version" : 1,
"_type" : "position",
"_index" : "track_201108",
"_source" : {
"hourly" : false,
"loc" : {
"type" : "point",
"coordinates" : [
103.694783333,
1.23463333333
]
},
"ts" : 1314758608000,
"entity_id" : 298082
},
"_id" : "298082_1314758608000_1302"
}

So i have only one document (and it was never updated as version is 1 ).

I don't understand what is going on here .

No special routing, no parent/child relations.

Any ideas ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e7a3514-bc91-414b-a88f-fe93c17df7ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #4

What do you have with: curl -XGET 'http://localhost:9200/track_2011*/'

--
David Pilato - Developer | Evangelist
elastic.co
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 29 avr. 2015 à 17:44, Georgi Ivanov georgi.r.ivanov@gmail.com a écrit :

1.5.2

On Wednesday, April 29, 2015 at 4:44:09 PM UTC+2, Georgi Ivanov wrote:

Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"gte": "2011-08-30T00:00:00Z",
"lte": "2011-08-31T23:59:00Z"
}
}
},
{
"term": {
"entity_id": {
"value": "298082"
}
}
}
]
}
}
,
"sort": [
{
"ts": {
"order": "asc"
}
}
],
"size": 90

}

Result (there are more, just showing duplicates):
{
"_index": "track_201108",
"_type": "position",
"_id": "298082_1314758608000_1302",
"_score": null,
"_source": {
"ts": 1314758608000,
"entity_id": 298082,
"loc": {
"type": "point",
"coordinates": [
103.694783333,
1.23463333333
]
}
},
"sort": [
1314758608000
]
},
{
"_index": "track_201108",
"_type": "position",
"_id": "298082_1314758608000_1302",
"_score": null,
"_source": {
"ts": 1314758608000,
"entity_id": 298082,
"loc": {
"type": "point",
"coordinates": [
103.694783333,
1.23463333333
]
}
},
"sort": [
1314758608000
]
}

But if i get the document :

curl -s es01.host.com:9200/track_201108/position/298082_1314758608000_1302 | json_pp
{
"found" : true,
"_version" : 1,
"_type" : "position",
"_index" : "track_201108",
"_source" : {
"hourly" : false,
"loc" : {
"type" : "point",
"coordinates" : [
103.694783333,
1.23463333333
]
},
"ts" : 1314758608000,
"entity_id" : 298082
},
"_id" : "298082_1314758608000_1302"
}

So i have only one document (and it was never updated as version is 1 ).

I don't understand what is going on here .

No special routing, no parent/child relations.

Any ideas ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e7a3514-bc91-414b-a88f-fe93c17df7ed%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7e7a3514-bc91-414b-a88f-fe93c17df7ed%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1134D43C-9311-4D07-96DD-2F79DE201F58%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #5

Also could you try
curl -XGET 'localhost:9200/twitter/_search_shards'

And then search using
?preference=_shards:0,primary
?preference=_shards:1,primary
?preference=_shards:2,primary

And so on…

Try to locate on which shard you have the duplicates.

Are your sure you never used a routing key when indexing one of your docs?

--
David Pilato - Developer | Evangelist
elastic.co
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 29 avr. 2015 à 17:58, David Pilato david@pilato.fr a écrit :

What do you have with: curl -XGET 'http://localhost:9200/ http://localhost:9200/track_2011*/'

--
David Pilato - Developer | Evangelist
elastic.co http://elastic.co/
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 29 avr. 2015 à 17:44, Georgi Ivanov <georgi.r.ivanov@gmail.com mailto:georgi.r.ivanov@gmail.com> a écrit :

1.5.2

On Wednesday, April 29, 2015 at 4:44:09 PM UTC+2, Georgi Ivanov wrote:

Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"gte": "2011-08-30T00:00:00Z",
"lte": "2011-08-31T23:59:00Z"
}
}
},
{
"term": {
"entity_id": {
"value": "298082"
}
}
}
]
}
}
,
"sort": [
{
"ts": {
"order": "asc"
}
}
],
"size": 90

}

Result (there are more, just showing duplicates):
{
"_index": "track_201108",
"_type": "position",
"_id": "298082_1314758608000_1302",
"_score": null,
"_source": {
"ts": 1314758608000,
"entity_id": 298082,
"loc": {
"type": "point",
"coordinates": [
103.694783333,
1.23463333333
]
}
},
"sort": [
1314758608000
]
},
{
"_index": "track_201108",
"_type": "position",
"_id": "298082_1314758608000_1302",
"_score": null,
"_source": {
"ts": 1314758608000,
"entity_id": 298082,
"loc": {
"type": "point",
"coordinates": [
103.694783333,
1.23463333333
]
}
},
"sort": [
1314758608000
]
}

But if i get the document :

curl -s es01.host.com:9200/track_201108/position/298082_1314758608000_1302 | json_pp
{
"found" : true,
"_version" : 1,
"_type" : "position",
"_index" : "track_201108",
"_source" : {
"hourly" : false,
"loc" : {
"type" : "point",
"coordinates" : [
103.694783333,
1.23463333333
]
},
"ts" : 1314758608000,
"entity_id" : 298082
},
"_id" : "298082_1314758608000_1302"
}

So i have only one document (and it was never updated as version is 1 ).

I don't understand what is going on here .

No special routing, no parent/child relations.

Any ideas ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e7a3514-bc91-414b-a88f-fe93c17df7ed%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7e7a3514-bc91-414b-a88f-fe93c17df7ed%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1134D43C-9311-4D07-96DD-2F79DE201F58%40pilato.fr https://groups.google.com/d/msgid/elasticsearch/1134D43C-9311-4D07-96DD-2F79DE201F58%40pilato.fr?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ED754610-CD34-47C1-AB33-E63A8835B3D5%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(system) #6