Does ES have anything in it that makes it better than Lucene when one
needs to modify a large number of docs (e.g. modify a "tags" field for
50K documents in the result set) and see the changes reflected in real-
time?
With straight Lucene, one could get the NRT part, but modifying 50K
docs would trigger 50K doc deletes and 50K doc adds for just slightly
modified documents.
No, ES does not handle it and you will need to do a full update to change a
certain field (like rename a tag). There are hacks to do it on top of Lucene
(for example, by maintaining two parallel indices), but they are not really
manageable. Not really sure how NRT fits into this? Because of the deletes
and the cloning? If that is the case, then NRT is only opened in a scheduled
manner (though there is an API for that).
The nice thing is that this will be much much faster since you go
distributed and you basically spread the load.
Does ES have anything in it that makes it better than Lucene when one
needs to modify a large number of docs (e.g. modify a "tags" field for
50K documents in the result set) and see the changes reflected in real-
time?
With straight Lucene, one could get the NRT part, but modifying 50K
docs would trigger 50K doc deletes and 50K doc adds for just slightly
modified documents.
No, ES does not handle it and you will need to do a full update to change a
certain field (like rename a tag). There are hacks to do it on top of Lucene
(for example, by maintaining two parallel indices), but they are not really
manageable. Not really sure how NRT fits into this? Because of the deletes
and the cloning? If that is the case, then NRT is only opened in a scheduled
manner (though there is an API for that).
The nice thing is that this will be much much faster since you go
distributed and you basically spread the load.
Could you please expand on this a bit? Since I didn't mention
distributed search, I wonder what you are referring to.
Are you saying that IF I were to involve multiple shards (and thus
multiple nodes/servers), then batch doc updates would be faster
because, since docs would be spread over multiple nodes, the overall
time needed to update a large batch of docs would be shorter because
updates of sub-sets of docs would happen in parallel on multiple
nodes?
Does ES have anything in it that makes it better than Lucene when one
needs to modify a large number of docs (e.g. modify a "tags" field for
50K documents in the result set) and see the changes reflected in real-
time?
With straight Lucene, one could get the NRT part, but modifying 50K
docs would trigger 50K doc deletes and 50K doc adds for just slightly
modified documents.
Yep, I meant that the indexing process or update process would be spread
across several nodes, thus will be faster. Sadly, there is no simple
solution for this in the Lucene world as far as I know... .
No, ES does not handle it and you will need to do a full update to
change a
certain field (like rename a tag). There are hacks to do it on top of
Lucene
(for example, by maintaining two parallel indices), but they are not
really
manageable. Not really sure how NRT fits into this? Because of the
deletes
and the cloning? If that is the case, then NRT is only opened in a
scheduled
manner (though there is an API for that).
The nice thing is that this will be much much faster since you go
distributed and you basically spread the load.
Could you please expand on this a bit? Since I didn't mention
distributed search, I wonder what you are referring to.
Are you saying that IF I were to involve multiple shards (and thus
multiple nodes/servers), then batch doc updates would be faster
because, since docs would be spread over multiple nodes, the overall
time needed to update a large batch of docs would be shorter because
updates of sub-sets of docs would happen in parallel on multiple
nodes?
Does ES have anything in it that makes it better than Lucene when one
needs to modify a large number of docs (e.g. modify a "tags" field for
50K documents in the result set) and see the changes reflected in real-
time?
With straight Lucene, one could get the NRT part, but modifying 50K
docs would trigger 50K doc deletes and 50K doc adds for just slightly
modified documents.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.