Setting id of document with elasticsearch-hadoop that is not in source document

Brian_Thomas_2 · July 7, 2014, 7:48pm

I am trying to update an elasticsearch index using elasticsearch-hadoop. I
am aware of the es.mapping.id configuration where you can specify that
field in the document to use as an id, but in my case the source document
does not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id to update without having
the add a new field to the MapWritable object?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

costin · July 10, 2014, 9:48pm

You need to specify the id of the document you want to update somehow. Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id' to indicate its value.
Is there a reason why this approach does not work for you - any alternatives that you thought of?

Cheers,

On 7/7/14 10:48 PM, Brian Thomas wrote:

I am trying to update an elasticsearch index using elasticsearch-hadoop. I am aware of the es.mapping.id
configuration where you can specify that field in the document to use as an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53BF0A45.7000403%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Brian_Thomas_2 · July 11, 2014, 6:31pm

I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.include.in.src where you could
specify whether the src field actually gets included in the source
document. In elasticsearch, you can create and update documents without
having to include the id in the source document, so I think it would make
sense to be able to do that with elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?

Cheers,

On 7/7/14 10:48 PM, Brian Thomas wrote:

I am trying to update an elasticsearch index using elasticsearch-hadoop.
I am aware of the es.mapping.id
configuration where you can specify that field in the document to use as
an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:> <mailto:
elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com

<
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/77259ed3-a896-47cc-9304-cc32046756ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brian_Thomas_2 · July 11, 2014, 6:36pm

I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.exlude that will not include
the id field in the source document. In elasticsearch, you can create and
update documents without having to include the id in the source document,
so I think it would make sense to be able to do that with
elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?

Cheers,

On 7/7/14 10:48 PM, Brian Thomas wrote:

I am trying to update an elasticsearch index using elasticsearch-hadoop.
I am aware of the es.mapping.id
configuration where you can specify that field in the document to use as
an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:> <mailto:
elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com

<
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

costin · July 11, 2014, 8:39pm

Hi,

I've opened up issue #230 to address your use case. Rather than offering a
dedicated field for the ID, I opted to introduce an "include", "exclude"
option to select (or remove) certain fields from a document before being
saved to es. This will basically allow documents to be filtered and thus
exclude the 'metadata' or fields that are not needed in ES directly through
es-hadoop.

Cheers,

On Fri, Jul 11, 2014 at 9:36 PM, Brian Thomas brianjthomas85@gmail.com
wrote:

I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.exlude that will not include
the id field in the source document. In elasticsearch, you can create and
update documents without having to include the id in the source document,
so I think it would make sense to be able to do that with
elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?

Cheers,

On 7/7/14 10:48 PM, Brian Thomas wrote:

I am trying to update an elasticsearch index using
elasticsearch-hadoop. I am aware of the es.mapping.id
configuration where you can specify that field in the document to use
as an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <mailto:elasticsearch+
unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-
d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/ce6161ad-
d442-4ffb-9162-114cb8cd76dd%40GGGROUPS CASINO – Real Slot Casino for 10,000+ Senior Players
email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmd-EBAvd7hC3CZs%2BhjoohNuC_%2B%3Da%2B2k_kqKeKO9-jLFmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

James_Campbell · July 13, 2014, 12:27pm

Thanks for suggesting this option, I would also definitely like to have an
exclude option for fields that I currently have to include only to set the
_id, _type, and index, resulting in unnecessary fields in _source.

On Fri, Jul 11, 2014 at 4:39 PM, Costin Leau costin.leau@gmail.com wrote:

Hi,

I've opened up issue #230 to address your use case. Rather than offering a
dedicated field for the ID, I opted to introduce an "include", "exclude"
option to select (or remove) certain fields from a document before being
saved to es. This will basically allow documents to be filtered and thus
exclude the 'metadata' or fields that are not needed in ES directly through
es-hadoop.

Cheers,

On Fri, Jul 11, 2014 at 9:36 PM, Brian Thomas brianjthomas85@gmail.com
wrote:

I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.exlude that will not include
the id field in the source document. In elasticsearch, you can create and
update documents without having to include the id in the source document,
so I think it would make sense to be able to do that with
elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?

Cheers,

On 7/7/14 10:48 PM, Brian Thomas wrote:

I am trying to update an elasticsearch index using
elasticsearch-hadoop. I am aware of the es.mapping.id
configuration where you can specify that field in the document to use
as an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable
object?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <mailto:elasticsearch+
unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-
d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/ce6161ad-
d442-4ffb-9162-114cb8cd76dd%40GGGROUPS CASINO – Real Slot Casino for 10,000+ Senior Players
email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zynzkAIWzp0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAJogdmd-EBAvd7hC3CZs%2BhjoohNuC_%2B%3Da%2B2k_kqKeKO9-jLFmA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAJogdmd-EBAvd7hC3CZs%2BhjoohNuC_%2B%3Da%2B2k_kqKeKO9-jLFmA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BAQu3yWpd36vnCxHbUi83GEGue2WjYQ%2B%2Bj_7xRWrsnSEeCvBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
[Hadoop] Setting Document ID in Map Reduce Mapper Elasticsearch	5	962	July 6, 2017
[Hadoop][pig] How to set the document id? Elasticsearch	6	484	July 6, 2017
[elasticsearch-hadoop] How to specify es.mapping.id value from inside a map? Elasticsearch es-hadoop	2	2362	January 17, 2018
Providing id for each document in Index while indexing data from HDFS to elasticsearch index Elasticsearch	1	326	July 6, 2017
Update Document in Elasticsearch using spark 1.6 Elasticsearch es-hadoop	5	978	February 21, 2018

Setting id of document with elasticsearch-hadoop that is not in source document

Related topics