Very slow index speeds with dynamic mapping and large volume of documents with new fields

Hey all,
We're bumping up against a production problem I could use a hand with.
We're experiencing steadily decreasing index speeds. We have 12 c3.4xl
data nodes, and 1 c3.8xl master node (with 2 backups that are smaller).
We're indexing 45 million documents into a single index. Single shard
only, no replicas. As our number of documents grow, our indexing speed
slows to a crawl. We've applied all the standard mlockall, ulimit, and ssd
merge throttling tuning settings, so I feel our cluster is pretty good.

When I inspected the data, I've noticed our user is adding a new field on
every document. When I view the pending tasks on our master, the task
queue is always at least 300+ attempting to perform dynamic mapping. I've
also checked segment merging, we never have more than 1 merge going on, and
even then it lasts for a second or two, not long at all.

This brings me to my question. When dynamic mapping is performed, is this
on the master only? Obviously this would introduce a bottleneck, and
explain our sudden performance drop. I'm at a loss to explain this issue.
Any advice would be appreciated.

Thanks,
Todd

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0611317c-d3c1-4894-8fac-8ac4b36cbf15%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mapping changes do need to go through the master, so check how it is
performing.

On 11 March 2015 at 08:37, Todd Nine tnine@apigee.com wrote:

Hey all,
We're bumping up against a production problem I could use a hand with.
We're experiencing steadily decreasing index speeds. We have 12 c3.4xl
data nodes, and 1 c3.8xl master node (with 2 backups that are smaller).
We're indexing 45 million documents into a single index. Single shard
only, no replicas. As our number of documents grow, our indexing speed
slows to a crawl. We've applied all the standard mlockall, ulimit, and ssd
merge throttling tuning settings, so I feel our cluster is pretty good.

When I inspected the data, I've noticed our user is adding a new field on
every document. When I view the pending tasks on our master, the task
queue is always at least 300+ attempting to perform dynamic mapping. I've
also checked segment merging, we never have more than 1 merge going on, and
even then it lasts for a second or two, not long at all.

This brings me to my question. When dynamic mapping is performed, is this
on the master only? Obviously this would introduce a bottleneck, and
explain our sudden performance drop. I'm at a loss to explain this issue.
Any advice would be appreciated.

Thanks,
Todd

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0611317c-d3c1-4894-8fac-8ac4b36cbf15%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0611317c-d3c1-4894-8fac-8ac4b36cbf15%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-PeZJU%3DnfRpgT0hQ3dbDE_LKhW8etBAHqVSjbxHXZMMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.