I'm quite new to ES and I'm currently doing some experiments to
evaluate it.
The experiment consisted of the following:
1. Only one ES node up using a shared fs gateway dir
(the only config I did).
2. A process doing live twitter indexing (using it's
filtered streaming api).
3. Used Win7 and only one hard disk.
After ~6h indexing, I did a hard reboot of my computer and
restarted ES.
In the end, I noticed the following:
1. The gateway data dir had ~7000 files totaling ~9GB.
2. It took ~2h for the ES node to become available.
3. There were ~45k tweets indexed (not all tweets were
indexed due to applied filters).
So, with so few documents indexed, why the cluster recovery
took so long? What configuration affects this behavior? And finally,
why there were so many files in gateway dir? Any way to compact them?
(maybe this slowed down recovery).
I'm quite new to ES and I'm currently doing some experiments to
evaluate it.
The experiment consisted of the following:
1. Only one ES node up using a shared fs gateway dir
(the only config I did).
2. A process doing live twitter indexing (using it's
filtered streaming api).
3. Used Win7 and only one hard disk.
After ~6h indexing, I did a hard reboot of my computer and
restarted ES.
In the end, I noticed the following:
1. The gateway data dir had ~7000 files totaling ~9GB.
2. It took ~2h for the ES node to become available.
3. There were ~45k tweets indexed (not all tweets were
indexed due to applied filters).
So, with so few documents indexed, why the cluster recovery
took so long? What configuration affects this behavior? And finally,
why there were so many files in gateway dir? Any way to compact them?
(maybe this slowed down recovery).
I'm quite new to ES and I'm currently doing some experiments to
evaluate it.
The experiment consisted of the following:
1. Only one ES node up using a shared fs gateway dir
(the only config I did).
2. A process doing live twitter indexing (using it's
filtered streaming api).
3. Used Win7 and only one hard disk.
After ~6h indexing, I did a hard reboot of my computer and
restarted ES.
In the end, I noticed the following:
1. The gateway data dir had ~7000 files totaling ~9GB.
2. It took ~2h for the ES node to become available.
3. There were ~45k tweets indexed (not all tweets were
indexed due to applied filters).
So, with so few documents indexed, why the cluster recovery
took so long? What configuration affects this behavior? And finally,
why there were so many files in gateway dir? Any way to compact them?
(maybe this slowed down recovery).
That fs location is also on the local drive? Have you deleted the work dir
in elasticsearch before restarting (you shouldn't for fast recovery)? Ohh,
and one more thing, when you say it took 2 Hr for the nodes to become
available, what is available mean for you? The fact that you can search on
them, or a GREEN status in the cluster health API?
One more thing, if you can recreate it, then you can use the indices status
API and it will give you details on how long the recovery took, what data
was reused from the local work dir. It would be great if you can gist it.
I'm quite new to ES and I'm currently doing some experiments to
evaluate it.
The experiment consisted of the following:
1. Only one ES node up using a shared fs gateway dir
(the only config I did).
2. A process doing live twitter indexing (using it's
filtered streaming api).
3. Used Win7 and only one hard disk.
After ~6h indexing, I did a hard reboot of my computer and
restarted ES.
In the end, I noticed the following:
1. The gateway data dir had ~7000 files totaling ~9GB.
2. It took ~2h for the ES node to become available.
3. There were ~45k tweets indexed (not all tweets were
indexed due to applied filters).
So, with so few documents indexed, why the cluster recovery
took so long? What configuration affects this behavior? And finally,
why there were so many files in gateway dir? Any way to compact them?
(maybe this slowed down recovery).
That fs location is also on the local drive?
Yes
Have you deleted the work dir in elasticsearch before restarting?
No
What is available mean for you?
I mean that I could not access
That fs location is also on the local drive? Have you deleted the work dir
in elasticsearch before restarting (you shouldn't for fast recovery)? Ohh,
and one more thing, when you say it took 2 Hr for the nodes to become
available, what is available mean for you? The fact that you can search on
them, or a GREEN status in the cluster health API?
One more thing, if you can recreate it, then you can use the indices status
API and it will give you details on how long the recovery took, what data
was reused from the local work dir. It would be great if you can gist it.
I'm quite new to ES and I'm currently doing some experiments to
evaluate it.
The experiment consisted of the following:
1. Only one ES node up using a shared fs gateway dir
(the only config I did).
2. A process doing live twitter indexing (using it's
filtered streaming api).
3. Used Win7 and only one hard disk.
After ~6h indexing, I did a hard reboot of my computer and
restarted ES.
In the end, I noticed the following:
1. The gateway data dir had ~7000 files totaling ~9GB.
2. It took ~2h for the ES node to become available.
3. There were ~45k tweets indexed (not all tweets were
indexed due to applied filters).
So, with so few documents indexed, why the cluster recovery
took so long? What configuration affects this behavior? And finally,
why there were so many files in gateway dir? Any way to compact them?
(maybe this slowed down recovery).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.