Get document by id is does not work for some docs but the docs are there

Hi!

I could not find another person reporting this issue and I am totally
baffled by this weird issue. The problem is pretty straight forward. I have
an index with multiple mappings where I use parent child associations. The
parent is topic, the child is reply. I noticed that some topics where not
being found via the has_child filter with exactly the same information just
a different topic id. That is how I went down the rabbit hole and ended up
noticing that I cannot get to a topic with its ID.

What is even more strange is that I have a script that recreates the index
from a SQL source and everytime the same IDS are not found by elastic search

For example if I do this:

curl -XGET 'http://localhost:9200/topics/topic_en/173' | prettyjson
<paco/topic_search>
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 80 100 80 0 0 26143 0 --:--:-- --:--:-- --:--:--
40000
_index: topics_20131104211439
_type: topic_en
_id: 173
exists: false

Nothing, but the doc is there:

curl -XGET 'http://127.0.0.1:9200/topics/topic_en/_search' -d
'{"query":{"term":{"id":"173"}}}' | prettyjson
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 2127 100 2096 100 31 894k 13543 --:--:-- --:--:-- --:--:--
1023k
took: 1
timed_out: false
_shards:
total: 5
successful: 5
failed: 0
hits:
total: 1
max_score: 1
hits:
-
_index: topics_20131104211439
_type: topic_en
_id: 173
_score: 1
_source:

This is a sample dataset, the gaps on non found IDS is non linear, actually
most are not found. And again. If I drop and rebuild the index again the
same documents cant be found via GET api and the same ids that ES likes are
found.

I cant think of anything I am doing that is wrong here. Any ideas?

Cheers

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I guess it's due to routing. Children are routed to the same shard as the parent.

So here elasticsearch hits a shard based on doc id (not routing / parent key) which does not have your child doc.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 nov. 2013 à 04:48, Paco Viramontes kidpollo@gmail.com a écrit :

Hi!

I could not find another person reporting this issue and I am totally baffled by this weird issue. The problem is pretty straight forward. I have an index with multiple mappings where I use parent child associations. The parent is topic, the child is reply. I noticed that some topics where not being found via the has_child filter with exactly the same information just a different topic id. That is how I went down the rabbit hole and ended up noticing that I cannot get to a topic with its ID.

What is even more strange is that I have a script that recreates the index from a SQL source and everytime the same IDS are not found by Elasticsearch

For example if I do this:

curl -XGET 'http://localhost:9200/topics/topic_en/173' | prettyjson <paco/topic_search>
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 80 100 80 0 0 26143 0 --:--:-- --:--:-- --:--:-- 40000
_index: topics_20131104211439
_type: topic_en
_id: 173
exists: false

Nothing, but the doc is there:

curl -XGET 'http://127.0.0.1:9200/topics/topic_en/_search' -d '{"query":{"term":{"id":"173"}}}' | prettyjson
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2127 100 2096 100 31 894k 13543 --:--:-- --:--:-- --:--:-- 1023k
took: 1
timed_out: false
_shards:
total: 5
successful: 5
failed: 0
hits:
total: 1
max_score: 1
hits:
-
_index: topics_20131104211439
_type: topic_en
_id: 173
_score: 1
_source:

This is a sample dataset, the gaps on non found IDS is non linear, actually most are not found. And again. If I drop and rebuild the index again the same documents cant be found via GET api and the same ids that ES likes are found.

I cant think of anything I am doing that is wrong here. Any ideas?

Cheers

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Right, if I provide the routing in case of the parent it does work.

curl -XGET 'http://localhost:9200/topics/topic_en/147?routing=4'

So whats wrong with my search query that works for children of some parents?

curl -XGET 'http://127.0.0.1:9200/topics/topic_en/_search?routing=4' -d '{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"matra","fields":["topic.subject"]}},{"has_child":{"type":"reply_en","query":{"query_string":{"query":"matra","fields":["reply.content"]}}}}]}},"filter":{"and":{"filters":[{"term":{"community_id":4}}]}}}},"sort":,"from":0,"size":25}'
{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}


Francisco Viramontes
a.k.a. PAco

twitter.com/kidpollo (http://www.twitter.com/)
Facebook (Facebook)
linkedin.com/in/fviramontes (http://www.linkedin.com/in/fviramontes)

On Monday, November 4, 2013 at 9:48 PM, Paco Viramontes wrote:

Hi!

I could not find another person reporting this issue and I am totally baffled by this weird issue. The problem is pretty straight forward. I have an index with multiple mappings where I use parent child associations. The parent is topic, the child is reply. I noticed that some topics where not being found via the has_child filter with exactly the same information just a different topic id. That is how I went down the rabbit hole and ended up noticing that I cannot get to a topic with its ID.

What is even more strange is that I have a script that recreates the index from a SQL source and everytime the same IDS are not found by Elasticsearch

For example if I do this:

curl -XGET 'http://localhost:9200/topics/topic_en/173' | prettyjson <paco/topic_search>
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 80 100 80 0 0 26143 0 --:--:-- --:--:-- --:--:-- 40000
_index: topics_20131104211439
_type: topic_en
_id: 173
exists: false

Nothing, but the doc is there:

curl -XGET 'http://127.0.0.1:9200/topics/topic_en/_search' -d '{"query":{"term":{"id":"173"}}}' | prettyjson
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2127 100 2096 100 31 894k 13543 --:--:-- --:--:-- --:--:-- 1023k
took: 1
timed_out: false
_shards:
total: 5
successful: 5
failed: 0
hits:
total: 1
max_score: 1
hits:
-
_index: topics_20131104211439
_type: topic_en
_id: 173
_score: 1
_source:

This is a sample dataset, the gaps on non found IDS is non linear, actually most are not found. And again. If I drop and rebuild the index again the same documents cant be found via GET api and the same ids that ES likes are found.

I cant think of anything I am doing that is wrong here. Any ideas?

Cheers

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/B_R0xxisU2g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com (mailto:elasticsearch+unsubscribe@googlegroups.com).
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hm. Could help with a full curl recreation as I don't have a clear overview here.
Basically, I'd say that that you are searching for parent docs but in child index/type rest end point.

Are you sure you search should run on topic_en/_search?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

5 novembre 2013 at 07:35:48, Francisco Viramontes (kidpollo@gmail.com) a écrit:

Right, if I provide the routing in case of the parent it does work.

curl -XGET 'http://localhost:9200/topics/topic_en/147?routing=4'

So whats wrong with my search query that works for children of some parents?

curl -XGET 'http://127.0.0.1:9200/topics/topic_en/_search?routing=4' -d '{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"matra","fields":["topic.subject"]}},{"has_child":{"type":"reply_en","query":{"query_string":{"query":"matra","fields":["reply.content"]}}}}]}},"filter":{"and":{"filters":[{"term":{"community_id":4}}]}}}},"sort":[],"from":0,"size":25}'
{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}


Francisco Viramontes
a.k.a. PAco

twitter.com/kidpollo


linkedin.com/in/fviramontes

On Monday, November 4, 2013 at 9:48 PM, Paco Viramontes wrote:

Hi!

I could not find another person reporting this issue and I am totally baffled by this weird issue. The problem is pretty straight forward. I have an index with multiple mappings where I use parent child associations. The parent is topic, the child is reply. I noticed that some topics where not being found via the has_child filter with exactly the same information just a different topic id. That is how I went down the rabbit hole and ended up noticing that I cannot get to a topic with its ID.

What is even more strange is that I have a script that recreates the index from a SQL source and everytime the same IDS are not found by elastic search

For example if I do this:

curl -XGET 'http://localhost:9200/topics/topic_en/173' | prettyjson <paco/topic_search>
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 80 100 80 0 0 26143 0 --:--:-- --:--:-- --:--:-- 40000
_index: topics_20131104211439
_type: topic_en
_id: 173
exists: false

Nothing, but the doc is there:

curl -XGET 'http://127.0.0.1:9200/topics/topic_en/_search' -d '{"query":{"term":{"id":"173"}}}' | prettyjson
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2127 100 2096 100 31 894k 13543 --:--:-- --:--:-- --:--:-- 1023k
took: 1
timed_out: false
_shards:
total: 5
successful: 5
failed: 0
hits:
total: 1
max_score: 1
hits:
-
_index: topics_20131104211439
_type: topic_en
_id: 173
_score: 1
_source:

This is a sample dataset, the gaps on non found IDS is non linear, actually most are not found. And again. If I drop and rebuild the index again the same documents cant be found via GET api and the same ids that ES likes are found.

I cant think of anything I am doing that is wrong here. Any ideas?

Cheers

You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/B_R0xxisU2g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Seems I failed to specify the _routing field in the bulk indexing put call. ¬¬

From the documentation I would never have figured that out


Francisco Viramontes
a.k.a. PAco

twitter.com/kidpollo (http://www.twitter.com/)
Facebook (Facebook)
linkedin.com/in/fviramontes (http://www.linkedin.com/in/fviramontes)

On Tuesday, November 5, 2013 at 12:35 AM, Francisco Viramontes wrote:

Right, if I provide the routing in case of the parent it does work.

curl -XGET 'http://localhost:9200/topics/topic_en/147?routing=4'

So whats wrong with my search query that works for children of some parents?

curl -XGET 'http://127.0.0.1:9200/topics/topic_en/_search?routing=4' -d '{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"matra","fields":["topic.subject"]}},{"has_child":{"type":"reply_en","query":{"query_string":{"query":"matra","fields":["reply.content"]}}}}]}},"filter":{"and":{"filters":[{"term":{"community_id":4}}]}}}},"sort":,"from":0,"size":25}'
{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}


Francisco Viramontes
a.k.a. PAco

twitter.com/kidpollo (http://www.twitter.com/)
Facebook (Facebook)
linkedin.com/in/fviramontes (http://www.linkedin.com/in/fviramontes)

On Monday, November 4, 2013 at 9:48 PM, Paco Viramontes wrote:

Hi!

I could not find another person reporting this issue and I am totally baffled by this weird issue. The problem is pretty straight forward. I have an index with multiple mappings where I use parent child associations. The parent is topic, the child is reply. I noticed that some topics where not being found via the has_child filter with exactly the same information just a different topic id. That is how I went down the rabbit hole and ended up noticing that I cannot get to a topic with its ID.

What is even more strange is that I have a script that recreates the index from a SQL source and everytime the same IDS are not found by Elasticsearch

For example if I do this:

curl -XGET 'http://localhost:9200/topics/topic_en/173' | prettyjson <paco/topic_search>
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 80 100 80 0 0 26143 0 --:--:-- --:--:-- --:--:-- 40000
_index: topics_20131104211439
_type: topic_en
_id: 173
exists: false

Nothing, but the doc is there:

curl -XGET 'http://127.0.0.1:9200/topics/topic_en/_search' -d '{"query":{"term":{"id":"173"}}}' | prettyjson
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2127 100 2096 100 31 894k 13543 --:--:-- --:--:-- --:--:-- 1023k
took: 1
timed_out: false
_shards:
total: 5
successful: 5
failed: 0
hits:
total: 1
max_score: 1
hits:
-
_index: topics_20131104211439
_type: topic_en
_id: 173
_score: 1
_source:

This is a sample dataset, the gaps on non found IDS is non linear, actually most are not found. And again. If I drop and rebuild the index again the same documents cant be found via GET api and the same ids that ES likes are found.

I cant think of anything I am doing that is wrong here. Any ideas?

Cheers

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/B_R0xxisU2g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com (mailto:elasticsearch+unsubscribe@googlegroups.com).
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.