Update single field of a document

Updatable fields in a highly distributed searchable datastore is a use case that key:data store style nosql data bases.

Examples are
Riak, Voldort, amd cassandra

Hi,
Wanted to know if ES has a solution for updateable fields.

I did google a little but could not find anything latest on this.(Though
the latest in ES is pretty exciting :slight_smile: )

I had two questions if you have thought about them already:

1)Can we fully implement an updateable field using Lucene Codecs?
Without getting into details: We tried this by writing a custom postings
format and put the field in a key-value store. Our postings consumer, would
write directly to the key-value store. We could write directly to the store
without buffering anything in RAM as Lucene's Indexing chain invokes the
PerFieldPostingsFormat only at flush time - Ref:
FreqProxTermsWriterPerField flush method. However, Lucene also invokes the
custom PostingsConsumer/TermsConsumer at merge time. And both merge and
flush use the same methods of PostingsConumer and TermsConsumer
(startDoc,startTerm,finishDoc,finishTerm etc). And since in these methods
we did not buffer anything and wrote directly to the key-value store, we
wrote the new merged state also to the store directly. But a merged segment
is checkpointed and not yet commited (or fsynced) and we got into
inconsistencies with respect to copying data to other nodes as well as
search would fail if IndexReader did not open to the new merged segment.

I tried rectifying this problem by putting the new merged info (document no
remapping) into an in-memory structure (searchable by PartialProducer), but
we did not have any good event to flush this in-memory merge info to the
key-value store, so we did it at the next flush. However, Lucene can commit
a checkpointed merged segment without flushing anything. So, when Lucene
committed the merged segment without passing any signal to the custom
PostingsFormat, we got into inconsistency again.
There were more problems like : how can you update a document in the same
segment, because the fields with custom postingsFormat are available for
update only after flushing of the segment.

I am trying something more by using a DirectoryWrapper and a
SegmentInfosFormat, but have some doubts on the whole approach of providing
updateable field using codecs.

  1. I am sure you are aware of this patch

On Tuesday, 16 August 2011 06:38:24 UTC+5:30, kimchy wrote:

Otis, are you referring to this:
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html?
And you think its the same..., really? Are you sure you understand what it
means to provide updatable fields, and then taking them to a distributed
system? What I would love is to really think about "comprable" "features"
before throwing them out here (similar to the "update processor" suggestion
for notifications), with or without smilies.

On Tue, Aug 16, 2011 at 3:19 AM, Otis Gospodnetic <otis.gos...@gmail.com
<javascript:>> wrote:

Andy,

In Solr land ExternalFileFile is designed for your use case (see
http://search-lucene.com/?q=ExternalFileField )
I think there is nothing like that in ES, but I'd love for somebody
to point I'm wrong about this! :slight_smile:

Otis

Sematext is hiring Search Engineers --
Jobs - Sematext

On Aug 15, 12:38 am, Andy selforgani...@gmail.com wrote:

I vote for this feature as well.

I have a "popularity" field that holds the number of user votes a
document has received. I use it to influence result ranking. It is
frequently updated. Right now every time a user votes on a document
I'd need to reindex the entire document which is obviously very
inefficient.

It'd be great to have a way to update certain fields without
reindexing the entire document. Solr has an ExternalFileField field
type for this purpose but it's not very user friendly.

Don't know if it's possible to implement such an "update certain field
without reindexing the whole document" feature in ES but if it's
possible it'd be very useful.

On Aug 13, 4:56 pm, Ridvan Gyundogan ridva...@gmail.com wrote:

To be more concrete this is my use case, or the use case I expect to
have after short time:
I have 1 mln documents in elasticsearch and only in elasticsearch
because I do performance tests and they are more or less random.
Now for the new functionality in each document I want to add random
"sellPrice" field.

What I started to do is a code which takes all the documents out, adds
randomSell price to them and imports back again, but this does not
look very effective.
We have very often use cases where we add new fields for search.

I do not see how the versioning helps me in this case.

On Aug 13, 5:05 pm, Clinton Gormley cl...@traveljury.com wrote:

On Sat, 2011-08-13 at 15:55 +0200, David Pilato wrote:

I think you can easily handle it on your side.

  • ask ES to get your document ( Get /index/doc/1 )
  • Then modify your field
  • Then send back to ES the new version of the document
    (put /index/doc/1 )

To add to what David said, you can use Elasticsearch's versioning
feature to make sure that you don't overwrite any changes that have
been
made while you are updating a document

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fad9967d-cf7a-47d8-b0aa-8bed713e9d5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The thread you are quoting here is nearly 4 years old, it might be better
if you start a new thread as it's possible the info contained in this will
be out of date.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 15 June 2014 19:41, Aditya aditya.tripathi@gmail.com wrote:

Hi,
Wanted to know if ES has a solution for updateable fields.

I did google a little but could not find anything latest on this.(Though
the latest in ES is pretty exciting :slight_smile: )

I had two questions if you have thought about them already:

1)Can we fully implement an updateable field using Lucene Codecs?
Without getting into details: We tried this by writing a custom postings
format and put the field in a key-value store. Our postings consumer, would
write directly to the key-value store. We could write directly to the store
without buffering anything in RAM as Lucene's Indexing chain invokes the
PerFieldPostingsFormat only at flush time - Ref:
FreqProxTermsWriterPerField flush method. However, Lucene also invokes the
custom PostingsConsumer/TermsConsumer at merge time. And both merge and
flush use the same methods of PostingsConumer and TermsConsumer
(startDoc,startTerm,finishDoc,finishTerm etc). And since in these methods
we did not buffer anything and wrote directly to the key-value store, we
wrote the new merged state also to the store directly. But a merged segment
is checkpointed and not yet commited (or fsynced) and we got into
inconsistencies with respect to copying data to other nodes as well as
search would fail if IndexReader did not open to the new merged segment.

I tried rectifying this problem by putting the new merged info (document
no remapping) into an in-memory structure (searchable by PartialProducer),
but we did not have any good event to flush this in-memory merge info to
the key-value store, so we did it at the next flush. However, Lucene can
commit a checkpointed merged segment without flushing anything. So, when
Lucene committed the merged segment without passing any signal to the
custom PostingsFormat, we got into inconsistency again.
There were more problems like : how can you update a document in the same
segment, because the fields with custom postingsFormat are available for
update only after flushing of the segment.

I am trying something more by using a DirectoryWrapper and a
SegmentInfosFormat, but have some doubts on the whole approach of providing
updateable field using codecs.

  1. I am sure you are aware of this patch -
    [LUCENE-5189] Numeric DocValues Updates - ASF JIRA . Updateable fields for
    NumericDocValue fields. We haven't tried this patch but just wanted to know
    if ES has considered it to provide numeric updateable fields.

On Tuesday, 16 August 2011 06:38:24 UTC+5:30, kimchy wrote:

Otis, are you referring to this: http://lucene.apache.
org/solr/api/org/apache/solr/schema/ExternalFileField.html? And you
think its the same..., really? Are you sure you understand what it means to
provide updatable fields, and then taking them to a distributed system?
What I would love is to really think about "comprable" "features" before
throwing them out here (similar to the "update processor" suggestion for
notifications), with or without smilies.

On Tue, Aug 16, 2011 at 3:19 AM, Otis Gospodnetic otis.gos...@gmail.com
wrote:

Andy,

In Solr land ExternalFileFile is designed for your use case (see
http://search-lucene.com/?q=ExternalFileField )
I think there is nothing like that in ES, but I'd love for somebody
to point I'm wrong about this! :slight_smile:

Otis

Sematext is hiring Search Engineers -- About Sematext - Solr / Elasticsearch Experts
jobs.html

On Aug 15, 12:38 am, Andy selforgani...@gmail.com wrote:

I vote for this feature as well.

I have a "popularity" field that holds the number of user votes a
document has received. I use it to influence result ranking. It is
frequently updated. Right now every time a user votes on a document
I'd need to reindex the entire document which is obviously very
inefficient.

It'd be great to have a way to update certain fields without
reindexing the entire document. Solr has an ExternalFileField field
type for this purpose but it's not very user friendly.

Don't know if it's possible to implement such an "update certain field
without reindexing the whole document" feature in ES but if it's
possible it'd be very useful.

On Aug 13, 4:56 pm, Ridvan Gyundogan ridva...@gmail.com wrote:

To be more concrete this is my use case, or the use case I expect to
have after short time:
I have 1 mln documents in elasticsearch and only in elasticsearch
because I do performance tests and they are more or less random.
Now for the new functionality in each document I want to add random
"sellPrice" field.

What I started to do is a code which takes all the documents out,
adds
randomSell price to them and imports back again, but this does not
look very effective.
We have very often use cases where we add new fields for search.

I do not see how the versioning helps me in this case.

On Aug 13, 5:05 pm, Clinton Gormley cl...@traveljury.com wrote:

On Sat, 2011-08-13 at 15:55 +0200, David Pilato wrote:

I think you can easily handle it on your side.

  • ask ES to get your document ( Get /index/doc/1 )
  • Then modify your field
  • Then send back to ES the new version of the document
    (put /index/doc/1 )

To add to what David said, you can use Elasticsearch's versioning
feature to make sure that you don't overwrite any changes that
have been
made while you are updating a document

clint

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fad9967d-cf7a-47d8-b0aa-8bed713e9d5b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fad9967d-cf7a-47d8-b0aa-8bed713e9d5b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YO-XSHa8zK_WqgjAuObNNnyQRTZ3yZSnYzuhVTD2T%2BLw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Mark,
I will start a new thread on this with better description of the problem.

On Sun, Jun 15, 2014 at 3:30 PM, Mark Walkom markw@campaignmonitor.com
wrote:

The thread you are quoting here is nearly 4 years old, it might be better
if you start a new thread as it's possible the info contained in this will
be out of date.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 15 June 2014 19:41, Aditya aditya.tripathi@gmail.com wrote:

Hi,
Wanted to know if ES has a solution for updateable fields.

I did google a little but could not find anything latest on this.(Though
the latest in ES is pretty exciting :slight_smile: )

I had two questions if you have thought about them already:

1)Can we fully implement an updateable field using Lucene Codecs?
Without getting into details: We tried this by writing a custom postings
format and put the field in a key-value store. Our postings consumer, would
write directly to the key-value store. We could write directly to the store
without buffering anything in RAM as Lucene's Indexing chain invokes the
PerFieldPostingsFormat only at flush time - Ref:
FreqProxTermsWriterPerField flush method. However, Lucene also invokes the
custom PostingsConsumer/TermsConsumer at merge time. And both merge and
flush use the same methods of PostingsConumer and TermsConsumer
(startDoc,startTerm,finishDoc,finishTerm etc). And since in these methods
we did not buffer anything and wrote directly to the key-value store, we
wrote the new merged state also to the store directly. But a merged segment
is checkpointed and not yet commited (or fsynced) and we got into
inconsistencies with respect to copying data to other nodes as well as
search would fail if IndexReader did not open to the new merged segment.

I tried rectifying this problem by putting the new merged info (document
no remapping) into an in-memory structure (searchable by PartialProducer),
but we did not have any good event to flush this in-memory merge info to
the key-value store, so we did it at the next flush. However, Lucene can
commit a checkpointed merged segment without flushing anything. So, when
Lucene committed the merged segment without passing any signal to the
custom PostingsFormat, we got into inconsistency again.
There were more problems like : how can you update a document in the same
segment, because the fields with custom postingsFormat are available for
update only after flushing of the segment.

I am trying something more by using a DirectoryWrapper and a
SegmentInfosFormat, but have some doubts on the whole approach of providing
updateable field using codecs.

  1. I am sure you are aware of this patch -
    [LUCENE-5189] Numeric DocValues Updates - ASF JIRA . Updateable fields
    for NumericDocValue fields. We haven't tried this patch but just wanted to
    know if ES has considered it to provide numeric updateable fields.

On Tuesday, 16 August 2011 06:38:24 UTC+5:30, kimchy wrote:

Otis, are you referring to this: http://lucene.apache.
org/solr/api/org/apache/solr/schema/ExternalFileField.html? And you
think its the same..., really? Are you sure you understand what it means to
provide updatable fields, and then taking them to a distributed system?
What I would love is to really think about "comprable" "features" before
throwing them out here (similar to the "update processor" suggestion for
notifications), with or without smilies.

On Tue, Aug 16, 2011 at 3:19 AM, Otis Gospodnetic <otis.gos...@gmail.com

wrote:

Andy,

In Solr land ExternalFileFile is designed for your use case (see
http://search-lucene.com/?q=ExternalFileField )
I think there is nothing like that in ES, but I'd love for somebody
to point I'm wrong about this! :slight_smile:

Otis

Sematext is hiring Search Engineers -- About Sematext - Solr / Elasticsearch Experts
jobs.html

On Aug 15, 12:38 am, Andy selforgani...@gmail.com wrote:

I vote for this feature as well.

I have a "popularity" field that holds the number of user votes a
document has received. I use it to influence result ranking. It is
frequently updated. Right now every time a user votes on a document
I'd need to reindex the entire document which is obviously very
inefficient.

It'd be great to have a way to update certain fields without
reindexing the entire document. Solr has an ExternalFileField field
type for this purpose but it's not very user friendly.

Don't know if it's possible to implement such an "update certain field
without reindexing the whole document" feature in ES but if it's
possible it'd be very useful.

On Aug 13, 4:56 pm, Ridvan Gyundogan ridva...@gmail.com wrote:

To be more concrete this is my use case, or the use case I expect to
have after short time:
I have 1 mln documents in elasticsearch and only in elasticsearch
because I do performance tests and they are more or less random.
Now for the new functionality in each document I want to add random
"sellPrice" field.

What I started to do is a code which takes all the documents out,
adds
randomSell price to them and imports back again, but this does not
look very effective.
We have very often use cases where we add new fields for search.

I do not see how the versioning helps me in this case.

On Aug 13, 5:05 pm, Clinton Gormley cl...@traveljury.com wrote:

On Sat, 2011-08-13 at 15:55 +0200, David Pilato wrote:

I think you can easily handle it on your side.

  • ask ES to get your document ( Get /index/doc/1 )
  • Then modify your field
  • Then send back to ES the new version of the document
    (put /index/doc/1 )

To add to what David said, you can use Elasticsearch's versioning
feature to make sure that you don't overwrite any changes that
have been
made while you are updating a document

clint

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fad9967d-cf7a-47d8-b0aa-8bed713e9d5b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fad9967d-cf7a-47d8-b0aa-8bed713e9d5b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/8U1v7Dzfrzk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YO-XSHa8zK_WqgjAuObNNnyQRTZ3yZSnYzuhVTD2T%2BLw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624YO-XSHa8zK_WqgjAuObNNnyQRTZ3yZSnYzuhVTD2T%2BLw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B9dQrHVUUhvvNT9UdLk2j_gv_0psgnfztNKXOTy53QEn85ZLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.