Updating the fields in index

Hi all;
To provide the general idea about the problem i m facing,i m giving an
example of very simple form of index generation as per the tutorial.

lets say i have created an index with some fields as;

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "admin",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'

I want to update the only field "user" so that it will have value as
"arien",for that i will have to do something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "arien",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'

Now the problem with this is I want to retain valuefields for
"post_date" and "message" unchanged without providing them at the time
of updation against same index id.for ex. something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "arien",
}'

but while querying(in search api) i should be getting all the 3 fields
specified in the index unlike i get only "user" field after updation
as the overridden index.

I m dealing with very large chunk of data at the time of indexing
using attachment mapper and its really an overhead to update an index
using all the fields everytime for even a small attribute change.So
please provide some suggestions regarding this.

Thanking you;
arien

Le 24 août 2011 à 09:09, arien a écrit :

Hi all;
To provide the general idea about the problem i m facing,i m giving an
example of very simple form of index generation as per the tutorial.

lets say i have created an index with some fields as;

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "admin",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'

I want to update the only field "user" so that it will have value as
"arien",for that i will have to do something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "arien",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'

Now the problem with this is I want to retain valuefields for
"post_date" and "message" unchanged without providing them at the time
of updation against same index id.for ex. something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "arien",
}'

but while querying(in search api) i should be getting all the 3 fields
specified in the index unlike i get only "user" field after updation
as the overridden index.

I m dealing with very large chunk of data at the time of indexing
using attachment mapper and its really an overhead to update an index
using all the fields everytime for even a small attribute change.So
please provide some suggestions regarding this.

The underlying library used by Elasticsearch, Lucene, doesn't handle field update, and will probably not anytime soon. There were some discussion to have a work around in Elasticsearch but it is not yet there, and it seems to be really a work around.

If the case fits for you, try to separate the fields that needs that needs to be indexed from the ones which don't. The fields that needs to be indexed, let them be handled by Elasticsearch, and for the others, choose another tool to store them, another classical database which would handle an update of just a field of a document. For the indexed fields, you'll have no choice but to push the entire document for each update. For the others, just update what you need. And when you query, search in elasticsearch and reconcile the data with the external data store.

Nicolas

Hi;
Thanks for your suggestion.I m looking forward to consider it as one of the
prominent way to solve my issue.

2011/8/24 Nicolas Lalevée nicolas.lalevee@hibnet.org

Le 24 août 2011 à 09:09, arien a écrit :

Hi all;
To provide the general idea about the problem i m facing,i m giving an
example of very simple form of index generation as per the tutorial.

lets say i have created an index with some fields as;

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "admin",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'

I want to update the only field "user" so that it will have value as
"arien",for that i will have to do something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "arien",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'

Now the problem with this is I want to retain valuefields for
"post_date" and "message" unchanged without providing them at the time
of updation against same index id.for ex. something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "arien",
}'

but while querying(in search api) i should be getting all the 3 fields
specified in the index unlike i get only "user" field after updation
as the overridden index.

I m dealing with very large chunk of data at the time of indexing
using attachment mapper and its really an overhead to update an index
using all the fields everytime for even a small attribute change.So
please provide some suggestions regarding this.

The underlying library used by Elasticsearch, Lucene, doesn't handle field
update, and will probably not anytime soon. There were some discussion to
have a work around in Elasticsearch but it is not yet there, and it seems to
be really a work around.

If the case fits for you, try to separate the fields that needs that needs
to be indexed from the ones which don't. The fields that needs to be
indexed, let them be handled by Elasticsearch, and for the others, choose
another tool to store them, another classical database which would handle an
update of just a field of a document. For the indexed fields, you'll have no
choice but to push the entire document for each update. For the others, just
update what you need. And when you query, search in elasticsearch and
reconcile the data with the external data store.

Nicolas

The simplest way to solve this is to get the document, update the relevant
field, and then index the document again. You can use versioning to make
sure no other update has "sneaked" in while you were doing the update.

On Wed, Aug 24, 2011 at 10:09 AM, arien ajaypadvi@gmail.com wrote:

Hi all;
To provide the general idea about the problem i m facing,i m giving an
example of very simple form of index generation as per the tutorial.

lets say i have created an index with some fields as;

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "admin",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'

I want to update the only field "user" so that it will have value as
"arien",for that i will have to do something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "arien",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'

Now the problem with this is I want to retain valuefields for
"post_date" and "message" unchanged without providing them at the time
of updation against same index id.for ex. something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "arien",
}'

but while querying(in search api) i should be getting all the 3 fields
specified in the index unlike i get only "user" field after updation
as the overridden index.

I m dealing with very large chunk of data at the time of indexing
using attachment mapper and its really an overhead to update an index
using all the fields everytime for even a small attribute change.So
please provide some suggestions regarding this.

Thanking you;
arien

Let's say I save content in two mappings of which I update (replace) only
one normally. In both mappings, I would use the same id. If now, for
example, both contents contain each the same string in at least one field,
is it possible to return only *one *hit as if I found the string in only
one mapping? (MySQL equivalent of "group by id")?

Thx

The Autonomy IDOL search engine has a ReplaceFieldValue action could update the indexed item . In fact ,I am facing the same problem with you. Extracting the attachment file is a heavy job. Re index the item makes the work quite unhappy , especially you have a lot of items to reindex. Such as you want to update the view count of the document, the replay count. If you want to order search results by this numbers. You have to index them into the index but not store it in the database.

The underline way of implement Update of Autonomy IDOL, seems to is making a new item by copy the indexed item and then replace the value from the parameter. Deleting the older item some time later.

May be the version feature of elasticsearch could make some sense to this senario.
Just some suggestion, I am a new comer of ES.

Could it be possible to use Parent/Child? From my point of view that seems
to be feasible, but a little slow as there are always two queries to be
made. I'd use one mapping for the attachment and another for the comments.

Yeah, looks like it seems to be ok. I overlooked a post of kimchy.