Updating documents without affecting search


(vijayaraj) #1

Hello Guys,
I have 500,000 documents in my Index which are currently being consumed by search API.
out of that there are 100,000 documents that I must update everyday.
Buy my update shouldnt affect the current searches.
The previous days document should be replaced with current days's document without any delay
How to achieve this?


(David Pilato) #2

Would it be possible to reindex actually all the 500k documents in a new index?
And use an alias to switch from the old to the new index instantly?


(vijayaraj) #3

Well that's the problem, the 500,000 may vary to 10,000,000. And also I might be getting updates like 10 times a day.
I'm that situation is it a good idea to reindex the entire index each and every time??? Will it be faster.
Thanks


(David Pilato) #4

The only way to switch from a given version to a brand new version of the index in a millisecond is by having 2 indices.

But may be describe your problem. We can think of other solutions.
Like why on the fly indexing won't work for you.


(vijayaraj) #5

Hello,
My scenario is, I have around 10,000,000 documents in my Index.
As I receive batch updates which might be around 700,000 documents for a particular advertiser, I wipe off the documents for the particular advertiser which I loaded the previous day and load the fresh feed. I get this kind of updates for various advertisers throughout the day at any point in time.
In the mean time when I perform the wipe off and reload operation, Any active searches performed on my Elasticsearch database may miss the 700,000 products.
So I am just wondering if there is any solution to overcome this.


(David Pilato) #6

Why not an index per advertiser?

Or

Can't you just update on the fly the existing documents?


(vijayaraj) #7

You mean the update query??
Coz deletion takes longer time like 30 seconds and insertion takes another 30 seconds.
But my requirement is to achieve zero downtime.


(David Pilato) #8

You don't need to delete. You just need to index again the new version of the documents.
Elasticsearch will automatically delete and index in one operation.


(vijayaraj) #9

Oh, that's beautiful. Thanks dadoonet


(vijayaraj) #10

I Found something on Elasticsearch Documentation

Setting version_type to external will cause Elasticsearch to preserve the version from the source, create any documents that are missing, and update any documents that have an older version in the destination index than they do in the source index:

But I also want to delete stuffs it it doesn't exist in the source index


(David Pilato) #11

Then why not this?

Why not an index per advertiser?


(vijayaraj) #12

If i have too many indexes then how do i perform search operation ??
I also wanna know how to delete while Reindexing..


(David Pilato) #13

You can search across multiple indices.

About reindexing, if you meant in another index, then you don't have to delete old documents. Just don't copy them.