I am trying to update an elasticsearch index using elasticsearch-hadoop. I
am aware of the es.mapping.id configuration where you can specify that
field in the document to use as an id, but in my case the source document
does not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id to update without having
the add a new field to the MapWritable object?
You need to specify the id of the document you want to update somehow. Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id' to indicate its value.
Is there a reason why this approach does not work for you - any alternatives that you thought of?
Cheers,
On 7/7/14 10:48 PM, Brian Thomas wrote:
I am trying to update an elasticsearch index using elasticsearch-hadoop. I am aware of the es.mapping.id
configuration where you can specify that field in the document to use as an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?
I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.
For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.include.in.src where you could
specify whether the src field actually gets included in the source
document. In elasticsearch, you can create and update documents without
having to include the id in the source document, so I think it would make
sense to be able to do that with elasticsearch-hadoop also.
On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:
You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?
Cheers,
On 7/7/14 10:48 PM, Brian Thomas wrote:
I am trying to update an elasticsearch index using elasticsearch-hadoop.
I am aware of the es.mapping.id
configuration where you can specify that field in the document to use as
an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:> <mailto: elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
To view this discussion on the web visit
I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.
For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.exlude that will not include
the id field in the source document. In elasticsearch, you can create and
update documents without having to include the id in the source document,
so I think it would make sense to be able to do that with
elasticsearch-hadoop also.
On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:
You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?
Cheers,
On 7/7/14 10:48 PM, Brian Thomas wrote:
I am trying to update an elasticsearch index using elasticsearch-hadoop.
I am aware of the es.mapping.id
configuration where you can specify that field in the document to use as
an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:> <mailto: elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
To view this discussion on the web visit
I've opened up issue #230 to address your use case. Rather than offering a
dedicated field for the ID, I opted to introduce an "include", "exclude"
option to select (or remove) certain fields from a document before being
saved to es. This will basically allow documents to be filtered and thus
exclude the 'metadata' or fields that are not needed in ES directly through
es-hadoop.
I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.
For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.exlude that will not include
the id field in the source document. In elasticsearch, you can create and
update documents without having to include the id in the source document,
so I think it would make sense to be able to do that with
elasticsearch-hadoop also.
On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:
You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?
Cheers,
On 7/7/14 10:48 PM, Brian Thomas wrote:
I am trying to update an elasticsearch index using
elasticsearch-hadoop. I am aware of the es.mapping.id
configuration where you can specify that field in the document to use
as an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable object?
Thanks for suggesting this option, I would also definitely like to have an
exclude option for fields that I currently have to include only to set the
_id, _type, and index, resulting in unnecessary fields in _source.
I've opened up issue #230 to address your use case. Rather than offering a
dedicated field for the ID, I opted to introduce an "include", "exclude"
option to select (or remove) certain fields from a document before being
saved to es. This will basically allow documents to be filtered and thus
exclude the 'metadata' or fields that are not needed in ES directly through
es-hadoop.
I was just curious if there was a way of doing this without doing this, I
can add the field if necessary.
For alternatives, what if in addition to es.mapping.id, there is another
property available also, like es.mapping.id.exlude that will not include
the id field in the source document. In elasticsearch, you can create and
update documents without having to include the id in the source document,
so I think it would make sense to be able to do that with
elasticsearch-hadoop also.
On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:
You need to specify the id of the document you want to update somehow.
Since in es-hadoop things are batch focused, each
doc needs its own id specified somehow hence the use of 'es.mapping.id'
to indicate its value.
Is there a reason why this approach does not work for you - any
alternatives that you thought of?
Cheers,
On 7/7/14 10:48 PM, Brian Thomas wrote:
I am trying to update an elasticsearch index using
elasticsearch-hadoop. I am aware of the es.mapping.id
configuration where you can specify that field in the document to use
as an id, but in my case the source document does
not have the id (I used elasticsearch's autogenerated id when indexing
the document). Is it possible to specify the id
to update without having the add a new field to the MapWritable
object?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.