after a long time of testing and hitting the wall with my head, i decided to ask here in the forum.
In our product we want to use Elasticsearch. Basicly, its implemented a long time already (started with version 0.90), but the product is not ready yet.
Its not a "new" product, but a reengineered one, so we need to migrate.
Some facts before i get into detail where my problem begins:
- Elasticsearch Version 1.7
- Client language PHP
- 1 Index (currently)
- 4 Mappings
- 10 Million Documents per Mapping (currently) (so 40 Million in total)
- Every Document got a size of ~12kb
- routing per custom field (over 5000 differend routings at the end)
- 2 Mappings got parent/child
- Daily growing index
- multiple daily updates on documents
- Disabled manual refresh after bulk indexing
Our mappings containing dynamic_templates, custom analyzer and raw values.
Where the problem begins, at the initial indexing phase.
We are migrating Data from a MySQL database over a parallel working mechanism. Basicly we build chunks between 10-15 MB and send them to Elasticsearch.
Already tried with 5,10,20,30 in parallel.
The chunksize was tested already too, tested 200,500,750,1000,2000.
Replicas are set to 0 while indexing
Server is with SSD Cache (next week "real" ssd), Heap Size was scaled from 16-32GB, with cluster (up to 4), without, we got all tested.
That said, every "bulk job" sends only to ONE routing in the current state.
The problems are, well, i dont know. We got Marvel installed, we see that the CPU goes up/down, we see that the Heap goes up/down and so on, but we dont know why
-** Indexing TEN million documents take over 2 Days**
- ES is rejecting with "queue is full (50)"
- ES is throtteling very fast (guess 30 min after starting the first bulks)
- Garbage collections goes active at some time, and then it doesnt do anything else.
I think much of these problems can be fixed with settings (i hope so).
Do you got some tips for me?