Problem in elasticsearch with Parent/Child realtions


(Juan Díaz González) #1

Hi,

Nowadays, I am creating two types inside of same Index. The firts type is the parent type and it is child from another type too. His mapping is:

{
"company": {
"mappings": {
"lic_server": {
"_parent": {
"type": "customer"
},
"_routing": {
"required": true
},
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"updated_at": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
}

And we have instances of this type inside of the index for example:

{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 4.4812403,
"hits": [
{
"_index": "company",
"_type": "lic__server",
"_id": "oid:asset:3",
"_score": 4.4812403,
"_source": {
"updated_at": "2015-07-29T10:48:12.696465",
"name": "lic.server ",
"id": "oid:3"
}
}
]
}
}

And its child would be the next mapping:

{
"company": {
"mappings": {
"event_lic_server": {
"_parent": {
"type": "lic_server"
},
"_routing": {
"required": true
},
"properties": {
"application": {
"type": "string",
"index": "not_analyzed"
},
"timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"updated_at": {
"type": "date",
"format": "dateOptionalTime"
},
"value": {
"type": "integer"
}
}
}
}
}
}

And would be instances that will be child from parent "id": "oid:3". For example:

{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6241,
"max_score": 3.6967838,
"hits": [
{
"_index": "company",
"_type": "event_lic_server",
"_id": "AU7ZLHEABnc3vxnAccdb",
"_score": 3.6967838,
"fields": {
"_parent": "oid:3"
}
},
{
"_index": "company",
"_type": "event_lic_server",
"_id": "AU7ZLHEDBnc3vxnAccdc",
"_score": 3.6967838,
"fields": {
"_parent": "oid:3"
}
},
{
"_index": "company",
"_type": "event_lic_server",
"_id": "AU7ZLHEGBnc3vxnAccdd",
"_score": 3.6967838,
"fields": {
"_parent": "oid:3"
}
} ......

But my problem is when I ask to child by parent Id using the has_parent query that doesn´t works correctly. For example:

GET company/event_lic_server/_search
{
"query": {
"has_parent": {
"parent_type": "lic_server",
"query": {
"term": {
"id": {
"value": "oid:3"
}
}
}
}
}
}

And the result is:

{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

Please, could somebody tell me where is the problemm or give me any idea about this. Because I am not be able to do it.

The first relation works correcty:

Grandparent --> Child/Parent works

But:
Child/Parent --> Child/GrandChild doesn´t work.

I have more than one GrandChild.

Thanks in advance and sorry by the inconvenience.


(Gabriel Tessier) #2

Hi,
As I understand parent/child relation is ok but more deeper relation greatparent/parent/child is not working?
How did you index your data?
If you don't find your data usually it's routing problem.
Can you "get" your data this way 127.0.0.1:9200/your_index/your_type/child_id?parent=parent_id&routing=grant_father_id.

Hope it can help.


(Juan Díaz González) #3

Thanks but I was more interested in search by the last relation:

Parent => child

Looking in the child (grandchild) by a parameter from the parent, but with the has_parent filter doesn´t work correctly:

{
"query": {
"has_parent": {
"parent_type": "lic_server",
"query": {
"term": {
"id": {
"value": "oid:3"
}
}
}
}
}
}

Thanks


(Juan Díaz González) #4

But for example if I use this query:

GET company/event_lic_server/_search
{
"query": {
"term": {
"event_lic_server._parent": {
"value": "oid:asset:3"
}
}
}
}

It´s said If I do a term query like: "term" : { "child_type._parent" : "parent_id" }

It works correctly but using the has_parent query doesn´t work.


(Colin Goodheart-Smithe) #5

Are you indexing you documents as @gabriel_tessier mentioned above? If you are not then this will probably be the cause of your issues as your child documents will not be located on the same shard as the parents and grand-parents which is required for this to work.


(Juan Díaz González) #6

I test that @gabriel_tessier said me before and the result is:

{"_index":"gompute","_type":"event_lic_server","_id":"child_id","found":false}

I think that the problem is this, that I have the parents and childs in differents shards. But How I can put it all in the same shard together.

Because when I do the injection, only I specify name of the meta _parent like the documentation says:

In pseudo_code:

e = event_lic_server(application=api)
e.meta.parent = asset
e.save()

and the parent meta is mapping like:

"_parent": {
"type": "lic_server"
},

But never I considered the shards to put together parent and childs. I think so that there is so childs to put all in the same shard because:

GET index/child_type/_count

the result is:

{
"count": 209675,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
}
}

How can I solve this? Could I put all in the same shard? And if I can do it, What is the query to put all in the same shard?

Thanks and sorry by the questions because I am a rookie elasticseach.


(Colin Goodheart-Smithe) #7

This is what the routing value is for when you index all your documents. You need to make sure that when you index your parent documents you use:

127.0.0.1:9200/your_index/your_type/parent_id?parent=grand_parent_id

Here the parent_id document will be routed to the same shard as the grand_parent_id document. When you index the child documents you specify the parent pointing to the parent_id and the routing points to the grand_parent_id:

127.0.0.1:9200/your_index/your_type/child_id?parent=parent_id&routing=grand_parent_id

This will ensure the child_id document is also located on the same shard as the grand_parent_id.

You will need to re-index all your documents to do this but it will ensure that your grand-parent, parent, and child documents are always located on the same shard as the relevant grand-parent doc.


(Juan Díaz González) #8

How could I ensure that my grand-parent, parent, and child documents are located on the same shard?

And other question if it is all in the same shard, could it not be insecure to fails?


(Gabriel Tessier) #9

Hi,
By setting parent and routing you are sure that your document is in the same shard that his parent and grand-parent.

I'm using parent child relation and after setting correctly the parent and routing I never had document which lost their route.

Also maybe seems stupid but things that you need to take care of:

  • on my dev machine I had one shard and on Jenkins 3 shards and my unit test fail randomly on Jenkins but work on my dev and I spent time to figure out this. :blush: So be sure to have at least 3 shards.
  • need to define your routing in the save and in the bulk, depends on how is coding your soft.
    It's just a checklist about the problem I encounter.

Hope it can help.


(Juan Díaz González) #10

Ok, thanks I will check this and I will read again the elasticsearch-dsl-py documentation to check where is the problem.

Thanks a lot for your help. I am so grateful. I will tell you about this.

Regards


(Juan Díaz González) #11

Hi,

I was testing all this. That you commented me before. And I have the number of shards by index is 5 with one replica per shard.

After I try again the next:

GET company/event_lic_server/AU7aAVedBnc3vxnAdO_O?parent=oid:asset:2

I have results OK and the result is:

{
"_index": "company",
"_type": "event_lic_server",
"_id": "AU7aAVedBnc3vxnAdO_O",
"_version": 1,
"found": true,
"_source": {
my_fields .....
}
}

If I do the next query asking by the parent I obltain all the results:

GET comany/event_lic_server/_search
{
"query": {
"has_parent": {
"parent_type": "lic_server",
"query": {
"term": {
"id": {
"value": "oid:asset:2"
}
}
}
}
}
}

However If I ask by another asset for example asset:1:

GET gompute/event_lic_server/AU7aBcT5Bnc3vxnAdPhf?parent=oid:asset:1

I obtain a correct request and give me a result too:

{
"_index": "gompute",
"_type": "event_lic_server",
"_id": "AU7aBcT5Bnc3vxnAdPhf",
"_version": 1,
"found": true,
"_source": {
my_fields ......

}
}

But now if I ask by parent:

GET gompute/event_lic_server/_search
{
"query": {
"has_parent": {
"parent_type": "lic_server",
"query": {
"term": {
"id": {
"value": "oid:asset:1"
}
}
}
}
}
}

The result is wrong:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

I cannot understand it because they were inserted in the same way. And when I save a document I said the parent id and when this is mapped I said the type of the parent class.

My conclusion it is that they are not in the same shard but I do not know how to solve it.

Thanks


(system) #12