Corruption when indexing large number of documents (4 billion+)

warkolm · January 9, 2015, 9:59am

Honestly, with this sort of scale you should be thinking about support
(disclaimer: I work for Elasticsearch support).

However let's see what we can do;
What version of ES, java?
What are you using to monitor your cluster?
How many GB is that index?
Is it in one massive index?
How many GB in your data in total?
Why do you have 2 replicas?
Are you searching while indexing, or just indexing the data? If it's the
latter then you might want to try disabling replica's and then setting the
index refresh rate to -1 for the index, insert your data, and then turn
refresh back on and then let the data index. That's best practice for large
amounts of indexing.

Also, consider dropping your bulk size down to 5K, that's generally
considered the upper limit for bulk API batches.

On 9 January 2015 at 14:44, Darshat Shah darshat@gmail.com wrote:

Hi,
We have a 98 node cluster of ES with each node 32GB RAM. 16GB is reserved
for ES via config file. The index has 98 shards with 2 replicas.

On this cluster we are loading a large number of documents (when done it
would be about 10 billion). About 40million documents are generated per
hour and we are pre-loading several days worth of documents to prototype
how ES will scale, and its query performance.

Right now we are facing problems getting data pre-loaded. Indexing is
turned off. We use NEST client, with batch size of 10k. To speed up data
load, we distribute the hourly data to each of the 98 nodes to insert in
parallel. This worked ok for a few hours till we got 4.5B documents in the
cluster.

After that the cluster state went to red. The outstanding tasks CAT API
shows errors like below. CPU/Disk/Memory seems to be fine on the nodes.

Why are we getting these errors and is there a best practice? any help
greatly appreciated since this blocks prototyping ES for our use case.

thanks
Darshat

Sample errors:

source : shard-failed ([agora_v1][24],
node[00ihc1ToRiqMDJ1lou1Sig], [R],
s[INITIALIZING]),
reason [Failed to start shard, message
[RecoveryFailedException[[agora_v1][24]: Recovery
failed from [Shingen
Harada][RDAwqX9yRgud9f7YtZAJPg][CH1
SCH060051438][inet[/10.46.153.84:9300]] into
[Elfqueen][

00ihc1ToRiqMDJ1lou1Sig][CH1SCH050053435][inet[/10.46.182
.106:9300]]]; nested:
RemoteTransportException[[Shingen
Harada][inet[/10.46.153.84:9300]][internal:index/shard/r
                   ecovery/start_recovery]]; nested:
                   RecoveryEngineException[[agora_v1][24] Phase[1]
                   Execution failed]; nested:
                   RecoverFilesRecoveryException[[agora_v1][24] Failed
to
transfer [0] files with total size of [0b]];
nested: NoS

uchFileException[D:\app\ES.Elasticsearch_v010\elasticsea

rch-1.4.1\data\AP-elasticsearch\nodes\0\indices\agora_v1
\24\index\segments_6r]; ]]

AND

source : shard-failed ([agora_v1][95],
node[PUsHFCStRaecPA6MuvJV9g], [P],
s[INITIALIZING]),
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[agora_v1][95]
failed to fetch index version after copying it
over];
nested: CorruptIndexException[[agora_v1][95]
Preexisting corrupted index
[corrupted_1wegvS7BSKSbOYQkX9zJSw] caused by:
CorruptIndexException[Read past EOF while reading
segment infos]
EOFException[read past EOF:
MMapIndexInput(path="D:\

app\ES.Elasticsearch_v010\elasticsearch-1.4.1\data\AP-el

asticsearch\nodes\0\indices\agora_v1\95\index\segments_1
1j")]
org.apache.lucene.index.CorruptIndexException: Read
past EOF while reading segment infos
at
org.elasticsearch.index.store.Store.readSegmentsI
nfo(Store.java:127)
at
org.elasticsearch.index.store.Store.access$400(St
ore.java:80)
at
org.elasticsearch.index.store.Store$MetadataSnaps
hot.buildMetadata(Store.java:575)
---snip more stack trace-----

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0f24b939-2cba-41a9-8de8-49565f77e567%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f24b939-2cba-41a9-8de8-49565f77e567%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_F07YZO5%2BDpRAyUE-RM5K0vs4JWQY3j8chbmjyzW7eng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Index corruption when upload large number of documents (4billion+) Elasticsearch	5	1071	July 6, 2017
ElasticSearch with > 40 nodes, missing shards and indexing troubles Elasticsearch	11	676	July 6, 2017
A few scaling questions Elasticsearch	4	421	July 6, 2017
Issue Indexing 50mil Docs via Bulk API Elasticsearch	23	2498	July 5, 2017
Large Scale elastic Search Logstash collection system Elasticsearch	6	464	July 6, 2017

Corruption when indexing large number of documents (4 billion+)

Related topics