Is it useful to have a load balancer node for indexing?

freeman · October 19, 2015, 7:29pm

I understand the usefulness of a loadbalancer node (master and data = false) for search as it can perform the aggregations outside the data nodes.

Is there any usefulness to this kind of node for indexing requests ?

nik9000 · October 19, 2015, 8:04pm

Its certainly worth testing it to make sure but its unlikely to do much for you.

The most useful things to do for indexing are to switch from single to bulk api and to set refresh interval high ("30s", -1, something like that). You can squeeze a bit more performance out of indexing by doing it with number_of_replicas: 0 and then adding replicas after a bulk import.

For non bulk use cases, refresh interval is the biggest thing to change. If that isn't enough for you then see if you can transform your non-bulk use case into a bulk one.

There is lots of talk about optimal bulk sizes floating around. I'm not super knowledgeable about it but I know people that are and they tend to measure throughput, increasing bulk size until the increase doesn't buy any performance increase. There is some hard limit in MB on the maximum sane size of bulk requests but I don't know what it is off hand but if you use the measurement technique above you'll find it on your own.

freeman · October 20, 2015, 7:03am

Thanks, great answer.