The python client avoids the _parent field when it does reindex

Hi,

When we run the following code:

from elasticsearch import Elasticsearch
from elasticsearch.helpers import reindex

if name == "main":
es = Elasticsearch()
reindex(es, source_index='2014_03', target_index='2014_03_new', chunk_size
=500, scroll='5m')

We get the 2014_03_new index with all the fields but the _parent field
inserted right. However, the _parent field, which we have in most of the
documents of the original index is consistently missing in all of the
documents of the new index.

It looks like a bug in the client's code.

We will appreciate your help.

Regards,
Costya, Totango Metrics.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b52dc918-4529-40ab-97e0-2a3b0176ec59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Costya,

the code actually looks for parent and should move it around. Are you
sure you have your mappings set up correctly for the new index to
include the parent/child relationship?

Thanks

On Sun, Oct 12, 2014 at 5:30 PM, Costya Regev costya@totango.com wrote:

Hi,

When we run the following code:

from elasticsearch import Elasticsearch
from elasticsearch.helpers import reindex

if name == "main":
es = Elasticsearch()
reindex(es, source_index='2014_03', target_index='2014_03_new',
chunk_size=500, scroll='5m')

We get the 2014_03_new index with all the fields but the _parent field
inserted right. However, the _parent field, which we have in most of the
documents of the original index is consistently missing in all of the
documents of the new index.

It looks like a bug in the client's code.

We will appreciate your help.

Regards,
Costya, Totango Metrics.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b52dc918-4529-40ab-97e0-2a3b0176ec59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABfdDiob4EpAk1WgPKOV816G-myjHu5zJQLLgwPM5J1V65mJPQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Kral,

I think that the mapping is fine. Here it is:

"mappings":{
"account":{
"properties":{
"name":{
"type":"string",
"index":"not_analyzed"
}

  ... ANOTHER ACCOUNT FIELDS ...
  
},
"_routing":{
  "required":true,
  "path":"name"
}

},
"event":{
"properties":{

  ... SOME EVENT FIELDS ...

  "account":{
    "type":"object",
    "properties":{

      ... SOME MORE FIELDS ...

      "name":{
        "type":"string",
        "index":"no"
      }
    }
  }
},
"_parent":{
  "type":"account"
},
"_timestamp":{
  "enabled":true,
  "store":true
},
"_routing":{
  "required":true,
  "path":"account.name"
}

}
}

Do you see anything wrong here?
Thx

On Sunday, October 12, 2014 9:46:37 PM UTC+3, Honza Král wrote:

Hi Costya,

the code actually looks for parent and should move it around. Are you
sure you have your mappings set up correctly for the new index to
include the parent/child relationship?

Thanks

On Sun, Oct 12, 2014 at 5:30 PM, Costya Regev <cos...@totango.com
<javascript:>> wrote:

Hi,

When we run the following code:

from elasticsearch import Elasticsearch
from elasticsearch.helpers import reindex

if name == "main":
es = Elasticsearch()
reindex(es, source_index='2014_03', target_index='2014_03_new',
chunk_size=500, scroll='5m')

We get the 2014_03_new index with all the fields but the _parent field
inserted right. However, the _parent field, which we have in most of the
documents of the original index is consistently missing in all of the
documents of the new index.

It looks like a bug in the client's code.

We will appreciate your help.

Regards,
Costya, Totango Metrics.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/b52dc918-4529-40ab-97e0-2a3b0176ec59%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/62eb9ecc-b88b-439e-bea7-b31141f1deb1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

this is indeed a problem with elasticsearch, sorry it took me so long
to realize this. The search api (and therefore the scan helper which
is used by the reindex) doesn't return the value for _parent by
default. I created an issue in elasticsearch to change that (0).

The workaround is to just use the scan and bulk methods (look how
reindex looks, copy-paste and modify) to add fields=['_source',
'_parent'] to the scan call and create your own version of
expand_action callback that will extract the _parent value from the
_fields dictionary.

This is very far from ideal so I also created an issue for
elasticsearch-py (1) to enable doing this (maybe even do this by
default) and describe the issues in the docs.

Thanks for reporting this and sorry for the oversight,
Honza

0 - Return _parent value by default for child documents · Issue #8068 · elastic/elasticsearch · GitHub
1 - Reindex doesn't support _parent and _routing · Issue #140 · elastic/elasticsearch-py · GitHub

On Mon, Oct 13, 2014 at 9:36 AM, Costya Regev costya@totango.com wrote:

Hi Kral,

I think that the mapping is fine. Here it is:

"mappings":{
"account":{
"properties":{
"name":{
"type":"string",
"index":"not_analyzed"
}

  ... ANOTHER ACCOUNT FIELDS ...

},
"_routing":{
  "required":true,
  "path":"name"
}

},
"event":{
"properties":{

  ... SOME EVENT FIELDS ...

  "account":{
    "type":"object",
    "properties":{

      ... SOME MORE FIELDS ...

      "name":{
        "type":"string",
        "index":"no"
      }
    }
  }
},
"_parent":{
  "type":"account"
},
"_timestamp":{
  "enabled":true,
  "store":true
},
"_routing":{
  "required":true,
  "path":"account.name"
}

}
}

Do you see anything wrong here?
Thx

On Sunday, October 12, 2014 9:46:37 PM UTC+3, Honza Král wrote:

Hi Costya,

the code actually looks for parent and should move it around. Are you
sure you have your mappings set up correctly for the new index to
include the parent/child relationship?

Thanks

On Sun, Oct 12, 2014 at 5:30 PM, Costya Regev cos...@totango.com wrote:

Hi,

When we run the following code:

from elasticsearch import Elasticsearch
from elasticsearch.helpers import reindex

if name == "main":
es = Elasticsearch()
reindex(es, source_index='2014_03', target_index='2014_03_new',
chunk_size=500, scroll='5m')

We get the 2014_03_new index with all the fields but the _parent field
inserted right. However, the _parent field, which we have in most of the
documents of the original index is consistently missing in all of the
documents of the new index.

It looks like a bug in the client's code.

We will appreciate your help.

Regards,
Costya, Totango Metrics.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/b52dc918-4529-40ab-97e0-2a3b0176ec59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/62eb9ecc-b88b-439e-bea7-b31141f1deb1%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABfdDiotzp4sPf5UDDBmD2wbpN92wAw_CUCo2o0ptK7-Vk7G8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.