Hi Clint,
Great.. I am basically case 1, but I saw that even though I don't specify
any ID in the indexing request, es automatically creates a new id, e.g,
version is increased to 2. But even after this, I saw that version 1 can be
accessed. So you mean to say that it won't be always accessible, i.e, the
version 1 will be visible for some time but will get removed automatically
after some time?
Hari
On Thu, Jun 23, 2011 at 2:33 PM, Clinton Gormley clinton@iannounce.co.ukwrote:
Hi Hari
How do you delete older versions of documents? Suppose I always need
only the latest version, I wouldn't want to waste space storing the
earlier versions as well.One of three scenarios here:
- You are using the same ID for each version of the doc
In this case, each time you index a doc with the same ID, the old
version will be marked as deleted, and will no longer be visible.These 'deleted' docs will be automatically removed at some point in the
future when the "segments" in your index are merged. This happens in
the background, but can be manually triggered using the optimize API.
- You are using different IDs for each version of the doc, or no ID, in
which case ES generates one for you.In this case, you are out of luck. ES can only identify that one docs
is a different version of another doc by looking at the ID.Instead, you would have to have some way of identifying which docs are
current, and which are old, and you can use the delete_by_query API to
mark the old ones as deleted.
- You have a rolling window
For instance, you're indexing log statements, and you want to have the
last week's data available to you, but automatically clear out anything
older.2Easiest thing here would be to create a new index every day, eg
"logs_2011-06-23" and only insert into today's index.For querying, you could create an alias eg "logs_week" which points to
"logs_2011-06-23", "logs_2011-06-22", etcThen each day you would create a new index, update the alias, and delete
the index for $today-8 daysclint