Index Size explosion (17 GB -> 840 GB)

Not saying this happened to you, but I've had bugs in update scripts before
that recursively include the source in the update. So when a document is
updated, I accidentally include the old _source as a field. The next time
the doc is updated, the _source is again included (which now includes two
copies of the old source), etc etc.

If you do that enough, the size quickly spirals out of control. Definitely
check your script to make sure it is doing what you think it is.

-Zach

On Monday, March 25, 2013 9:59:45 AM UTC-4, Vineeth Mohan wrote:

File size dump of the previus and present index

Previous - gist:5237294 · GitHub (Single shard ,
single machine)
Present - gist:5237236 · GitHub (4 shard , 4
machine)

On Mon, Mar 25, 2013 at 5:52 PM, Vineeth Mohan <vineet...@algotree.com<javascript:>

wrote:

Adding attachments.

PFA

Thanks
Vineeth

On Mon, Mar 25, 2013 at 5:51 PM, Vineeth Mohan <vineet...@algotree.com<javascript:>

wrote:

Adding some more info.
The number of replica is 0.

Also please find the before and after images of the head plugin attached.
Kindly note that the number of feeds is same for both.
Also the operation i did on the index after migration was just update
requests on all the feeds.

Thanks
Vineeth

On Mon, Mar 25, 2013 at 4:43 PM, Vineeth Mohan <vineet...@algotree.com<javascript:>

wrote:

Hi ,

I had an index which was initially of 17 GB. It had a single shard and
ran in a single machine.
Last week i migrated the data to a 4 shard index with the same mapping.
These 4 shards are distributed among 4 machines.
After migration , i ran a script which updates one of the field of
every feed.
Now after all the migration and updation , size of index is around 840
GB together each of the shard having around 250 GB of data.

I am not able to comprehend hat happened.
Kindly shed some light on this issue.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.