Setting id of document with elasticsearch-hadoop that is not in source document


(Brian Thomas-2) #1

I am trying to update an elasticsearch index using elasticsearch-hadoop. I
am aware of the es.mapping.id configuration where you can specify that
field in the document to use as an id, but in my case the source document
does not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id to update without having
the add a new field to the MapWritable object?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #2

You need to specify the id of the document you want to update somehow. Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id' to indicate its value.
Is there a reason why this approach does not work for you - any alternatives that you thought of?

Cheers,

On 7/7/14 10:48 PM, Brian Thomas wrote:

I am trying to update an elasticsearch index using elasticsearch-hadoop. I am aware of the es.mapping.id
configuration where you can specify that field in the document to use as an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53BF0A45.7000403%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Thomas-2) #3

I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.include.in.src where you could
specify whether the src field actually gets included in the source
document. In elasticsearch, you can create and update documents without
having to include the id in the source document, so I think it would make
sense to be able to do that with elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?

Cheers,

On 7/7/14 10:48 PM, Brian Thomas wrote:

I am trying to update an elasticsearch index using elasticsearch-hadoop.
I am aware of the es.mapping.id
configuration where you can specify that field in the document to use as
an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:> <mailto:
elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com

<
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/77259ed3-a896-47cc-9304-cc32046756ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Thomas-2) #4

I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.exlude that will not include
the id field in the source document. In elasticsearch, you can create and
update documents without having to include the id in the source document,
so I think it would make sense to be able to do that with
elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?

Cheers,

On 7/7/14 10:48 PM, Brian Thomas wrote:

I am trying to update an elasticsearch index using elasticsearch-hadoop.
I am aware of the es.mapping.id
configuration where you can specify that field in the document to use as
an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:> <mailto:
elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com

<
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #5

Hi,

I've opened up issue #230 to address your use case. Rather than offering a
dedicated field for the ID, I opted to introduce an "include", "exclude"
option to select (or remove) certain fields from a document before being
saved to es. This will basically allow documents to be filtered and thus
exclude the 'metadata' or fields that are not needed in ES directly through
es-hadoop.

Cheers,

On Fri, Jul 11, 2014 at 9:36 PM, Brian Thomas brianjthomas85@gmail.com
wrote:

I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.exlude that will not include
the id field in the source document. In elasticsearch, you can create and
update documents without having to include the id in the source document,
so I think it would make sense to be able to do that with
elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?

Cheers,

On 7/7/14 10:48 PM, Brian Thomas wrote:

I am trying to update an elasticsearch index using
elasticsearch-hadoop. I am aware of the es.mapping.id
configuration where you can specify that field in the document to use
as an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <mailto:elasticsearch+
unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-
d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/ce6161ad-
d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=
email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmd-EBAvd7hC3CZs%2BhjoohNuC_%2B%3Da%2B2k_kqKeKO9-jLFmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(James Campbell) #6

Thanks for suggesting this option, I would also definitely like to have an
exclude option for fields that I currently have to include only to set the
_id, _type, and index, resulting in unnecessary fields in _source.

On Fri, Jul 11, 2014 at 4:39 PM, Costin Leau costin.leau@gmail.com wrote:

Hi,

I've opened up issue #230 to address your use case. Rather than offering a
dedicated field for the ID, I opted to introduce an "include", "exclude"
option to select (or remove) certain fields from a document before being
saved to es. This will basically allow documents to be filtered and thus
exclude the 'metadata' or fields that are not needed in ES directly through
es-hadoop.

Cheers,

On Fri, Jul 11, 2014 at 9:36 PM, Brian Thomas brianjthomas85@gmail.com
wrote:

I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.exlude that will not include
the id field in the source document. In elasticsearch, you can create and
update documents without having to include the id in the source document,
so I think it would make sense to be able to do that with
elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?

Cheers,

On 7/7/14 10:48 PM, Brian Thomas wrote:

I am trying to update an elasticsearch index using
elasticsearch-hadoop. I am aware of the es.mapping.id
configuration where you can specify that field in the document to use
as an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable
object?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <mailto:elasticsearch+
unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-
d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/ce6161ad-
d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=
email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zynzkAIWzp0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAJogdmd-EBAvd7hC3CZs%2BhjoohNuC_%2B%3Da%2B2k_kqKeKO9-jLFmA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAJogdmd-EBAvd7hC3CZs%2BhjoohNuC_%2B%3Da%2B2k_kqKeKO9-jLFmA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BAQu3yWpd36vnCxHbUi83GEGue2WjYQ%2B%2Bj_7xRWrsnSEeCvBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7