Bulk Inserts taking a long time

I am having an issue with bulk updating records. As I send requests, the bulk update takes longer and longer to work. I am using the scroll api in batches of 100 then bulk updating them. Some of these operations will take between 25-45 secs to work. When I lower it to 10 per bulk update, the times are better but still very slow, around 5-15 secs. When this is running, I notice light load on the nodes.

We are self hosted and on version 7.4.2. We have a 9 node cluster (3 master, 6 data).

Below is an example of the body that I'm trying to update.

    (
        [body] => Array
            (
                [0] => Array
                    (
                        [update] => Array
                            (
                                [_index] => recordings
                                [_id] => 023abe67-101f-412e-9b24-b6d5c5302179
                            )
                    )
                [1] => Array
                    (
                        [doc] => Array
                            (
                                [billed_duration] => 180
                                [billed_amount] => 0.0325
                            )
                    )
            )
    )

Here are my index settings for reference

{
  "index.blocks.read_only_allow_delete": "false",
  "index.priority": "1",
  "index.query.default_field": [
    "*"
  ],
  "index.write.wait_for_active_shards": "1",
  "index.refresh_interval": "1s",
  "index.max_result_window": "100000",
  "index.analysis.filter.filter_shingle.max_shingle_size": "5",
  "index.analysis.filter.filter_shingle.min_shingle_size": "2",
  "index.analysis.filter.filter_shingle.output_unigrams": "true",
  "index.analysis.filter.filter_shingle.type": "shingle",
  "index.analysis.analyzer.analyzer_shingle.filter": [
    "lowercase",
    "filter_shingle"
  ],
  "index.analysis.analyzer.analyzer_shingle.tokenizer": "standard",
  "index.number_of_replicas": "1",
  "index.version.upgraded": "7040299"
}

I'm starting to wonder if there is a setting or something that I am missing. I tried turning the refresh interval to -1 and that doesn't seem to help.

Any Thoughts?

One thing that can severely impact update performance is if you are frequently updating the same document(s). This will cause a lot of small segments to be created which is expensive. Is this the case for your use case?

Updates also can get slower the larger shards get. Having nested documents that grow in size can also be a problem as the need to be updated completely every time.

It will be a once over process that will only update a doc once.

The other piece to this is that I will get through the first 30k with a time of 0.11 secs and then it halts and starts going to the 30 second area.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.