How can we speed up indexing our documents?

We currently index our documents in bulk using the PHP client. On average our index times (including travel over network) are:

1,000 documents in 4 seconds
10,000 documents in 80 seconds

If we index in parralel the index time are much larger, so that does not seem to be a solution.


Index: 1
Shard: 1
Node: 1
Memory: 1 GB
Storage: 16GB
Documents: 2,000,000

Our server is hosted by TransIP (Netherlands) which has a 100 Gb/s network connection. The elastic cluster is cloud based (Ireland).

How can we speed up indexing our documents?

That's pretty small, I would start there.


And/or reduce the bulk size.

This is a follow up from a telephone conversation this morning. The key concept here is that getting a 1GB cluster in the cloud also means that CPU power is very limited. Right now, Marvel is showing the total CPU usage of the machine in the cloud, but a 1GB cluster will only get a small fraction of that power.

My recommendation is to increase the size of the cluster in steps and record the benchmark results. That will give you an idea of how this solution behaves in different cluster sizes. You will be able to pay by the hour, keeping costs very low during this PoC phase.