Indexing large html fields / cluster instability

EikeD · December 13, 2016, 3:33pm

Hi Folks!

We're having some performance/stability problems in our cluster while indexing data. There is especially two fields with pretty large html content with a custom analyzer as below.

The contents in those fields are around 1Mb - 3Mb large.

What we're seeing is nodes dropping out of the cluster frequently while adding docs. Logs show longish garbage collections.. The cluster is 5 nodes of 31Gb heap.

Any suggestions to make this easier on the cluster? I don't mind it being slow, but instability i want to avoid.

"html_standard": {
"filter": [
"lowercase"
],
"char_filter": [
"html_strip"
],
"tokenizer": "standard"
}

polyfractal · December 13, 2016, 3:56pm

What version of Elasticsearch are you using? It's possible that the indexing is causing memory problems, but far more likely that it's your queries/aggregations.

How many documents are you sending per-bulk? Do you constrain the bulk size so that it doesn't go over n mb-per-bulk?
How many concurrent processes/threads are sending bulk requests
What kind of query/aggregations are you running?

EikeD · December 13, 2016, 4:19pm

Hi, thanks for the reply.

elastic version 5.1.1

I'm using the _reindex api right now, not sure about parallelism and bulk size actually.

Hardly any queries actually today. I quite consistently see these problems when indexing (or reindexing) or updating those documents.

polyfractal · December 13, 2016, 4:35pm

Ah, I see. I'd try lowering the batch size of Reindex, the default is 1000. If your docs are 1-3mb, you could be hitting your cluster with 1-3gb bulk requests, which will definitely make the heap unhappy (it has to buffer up that entire request in newgen memory before parsing and sending to various shards).

Try setting it something like 50 to start, and work up from there:

POST _reindex
{
  "source": {
    "index": "source",
    "size": 50
  },
  "dest": {
    "index": "dest"
  }
}

EikeD · December 15, 2016, 9:24am

Ah, yes of course. Thanks.

system · January 12, 2017, 9:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster (ES 5.2) performance degrading after indexing Elasticsearch	3	508	June 6, 2017
Updating every document to prepare for reindexing Elasticsearch reindex	1	317	September 7, 2023
Index Dimensioning and Optimization (across the Cluster) Elasticsearch	6	376	March 24, 2021
ElasticSearch Performance Elasticsearch	4	348	October 12, 2020
Indexing rate performance in cluster Elasticsearch	6	3759	July 5, 2017

Indexing large html fields / cluster instability

Related topics