Hey,
after a long time of testing and hitting the wall with my head, i decided to ask here in the forum.
In our product we want to use Elasticsearch. Basicly, its implemented a long time already (started with version 0.90), but the product is not ready yet.
Its not a "new" product, but a reengineered one, so we need to migrate.
Some facts before i get into detail where my problem begins:
- Elasticsearch Version 1.7
- Client language PHP
- 1 Index (currently)
- 4 Mappings
- 10 Million Documents per Mapping (currently) (so 40 Million in total)
- Every Document got a size of ~12kb
- routing per custom field (over 5000 differend routings at the end)
- 2 Mappings got parent/child
- Daily growing index
- multiple daily updates on documents
- Disabled manual refresh after bulk indexing
Our mappings containing dynamic_templates, custom analyzer and raw values.
Where the problem begins, at the initial indexing phase.
We are migrating Data from a MySQL database over a parallel working mechanism. Basicly we build chunks between 10-15 MB and send them to Elasticsearch.
Already tried with 5,10,20,30 in parallel.
The chunksize was tested already too, tested 200,500,750,1000,2000.
Replicas are set to 0 while indexing
Server is with SSD Cache (next week "real" ssd), Heap Size was scaled from 16-32GB, with cluster (up to 4), without, we got all tested.
That said, every "bulk job" sends only to ONE routing in the current state.
The problems are, well, i dont know. We got Marvel installed, we see that the CPU goes up/down, we see that the Heap goes up/down and so on, but we dont know why
-** Indexing TEN million documents take over 2 Days**
- ES is rejecting with "queue is full (50)"
- ES is throtteling very fast (guess 30 min after starting the first bulks)
- Garbage collections goes active at some time, and then it doesnt do anything else.
I think much of these problems can be fixed with settings (i hope so).
Do you got some tips for me?
Best regards,
Dominik



 I'm "only" indexing 10M documents now, but this was a pain in the ass before, so i'm in a good mood, that this will bring some good results.
 I'm "only" indexing 10M documents now, but this was a pain in the ass before, so i'm in a good mood, that this will bring some good results.