Suggestion for updating documents?


(Ivan Ji) #1

Hi all,

Assume I already had lot of documents inside ES and each document represent
one file.

But now I want to update some files' fields, so I need to find the
document, get its id, and then apply the _update operation.

And If I have n document to do such things and there are m document inside
the ES, the performance to search the desired document to get its id is
O(n*m), right? Because each finding operation needs to scan entire
documents inside the index, does it exist any way to find the desired
document with unique key, return immediately when found it?

If so, it's really not a good option when to update a document's field. I
am wondering what's the suggested workflow to update some file without
knowing its id first.

Ideas?

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c91ab7e-1ae5-4f97-abad-68afec17aa76%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Luca Cavanna) #2

How do you find the documents you need to update? I guess by executing a
query? In that case, the search won't scan all the documents, this is the
whole point of elasticsearch and lucene. There is an inverted index which
makes it easy to find matches based on the terms in your queries.

Still, updating a lot of documents can be quite expensive, but the problem
is not exactly the query part (aka finding the documents to update) but the
update itself, as it need to get each document back, delete it and reindex
it internally (that's how updates work in lucene). This is why the update
by query feature has not been exposed yet.

On Thursday, February 6, 2014 12:25:54 PM UTC+1, Ivan Ji wrote:

Hi all,

Assume I already had lot of documents inside ES and each document
represent one file.

But now I want to update some files' fields, so I need to find the
document, get its id, and then apply the _update operation.

And If I have n document to do such things and there are m document inside
the ES, the performance to search the desired document to get its id is
O(n*m), right? Because each finding operation needs to scan entire
documents inside the index, does it exist any way to find the desired
document with unique key, return immediately when found it?

If so, it's really not a good option when to update a document's field. I
am wondering what's the suggested workflow to update some file without
knowing its id first.

Ideas?

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a777758c-1c2b-4030-9bb4-16f28ee5b0d0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Ji) #3

Hi Luca,

Thanks for replies. In fact, I am not familiar with the internal algorithm
of ElasticSearch.
Seems I need to catch up these knowledge, such as lucene. Thanks a lot for
clearance of this question.

Ivan

Luca Cavanna於 2014年2月6日星期四UTC+8下午8時14分22秒寫道:

How do you find the documents you need to update? I guess by executing a
query? In that case, the search won't scan all the documents, this is the
whole point of elasticsearch and lucene. There is an inverted index which
makes it easy to find matches based on the terms in your queries.

Still, updating a lot of documents can be quite expensive, but the problem
is not exactly the query part (aka finding the documents to update) but the
update itself, as it need to get each document back, delete it and reindex
it internally (that's how updates work in lucene). This is why the update
by query feature has not been exposed yet.

On Thursday, February 6, 2014 12:25:54 PM UTC+1, Ivan Ji wrote:

Hi all,

Assume I already had lot of documents inside ES and each document
represent one file.

But now I want to update some files' fields, so I need to find the
document, get its id, and then apply the _update operation.

And If I have n document to do such things and there are m document
inside the ES, the performance to search the desired document to get its id
is O(n*m), right? Because each finding operation needs to scan entire
documents inside the index, does it exist any way to find the desired
document with unique key, return immediately when found it?

If so, it's really not a good option when to update a document's field. I
am wondering what's the suggested workflow to update some file without
knowing its id first.

Ideas?

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/11a76615-9f8b-4914-8623-30e050dfcce1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4