I'm quite new to ES and I'm currently doing some experiments to
The experiment consisted of the following: 1. Only one ES node up using a shared fs gateway dir
(the only config I did).
2. A process doing live twitter indexing (using it's
filtered streaming api).
3. Used Win7 and only one hard disk.
After ~6h indexing, I did a hard reboot of my computer and
In the end, I noticed the following: 1. The gateway data dir had ~7000 files totaling ~9GB. 2. It took ~2h for the ES node to become available. 3. There were ~45k tweets indexed (not all tweets were
indexed due to applied filters).
So, with so few documents indexed, why the cluster recovery
took so long? What configuration affects this behavior? And finally,
why there were so many files in gateway dir? Any way to compact them?
(maybe this slowed down recovery).
Thanks in advance,