Seeing slower bulk indexing performance after upgrade from ES5.2 to ES7.1

Hi,

We're seeing a much slower bulk indexing performance when upgrade from ES5.2 to ES7.1 I already went through the release note on 6 and 7 but couldn't find anything obvious that could impact this performance. I'm wondering if there's anything I'm missing here.

This is my index setting in 7:

{
  "my_index": {
    "settings": {
      "index": {
        "refresh_interval": "30s",
        "number_of_shards": "30",
        "provided_name": "my_index",
        "merge": {
          "scheduler": {
            "max_thread_count": "2"
          }
        },
        "max_result_window": "30000",
        "analysis": {
          "analyzer": {
            "whitespace_lowercase": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "whitespace"
            },
            "skip_url_email": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "uax_url_email"
            }
          }
        },
        "number_of_replicas": "3",
        "warmer": {
          "enabled": "false"
        }
      }
    }
  }
}

And this is my mapping in 7:

{
  "my_index" : {
    "mappings" : {
      "properties" : {
        "field_1" : {
          "type" : "boolean"
        },
        "field_2" : {
          "type" : "float"
        },
        "field_3" : {
          "type" : "keyword"
        },
        "field_4" : {
          "type" : "text",
          "analyzer" : "skip_url_email"
        },
        "field_5" : {
          "type" : "long"
        },
        "field_6" : {
          "type" : "keyword"
        },
        "field_7" : {
          "type" : "text",
          "fields" : {
            "raw" : {
              "type" : "text",
              "analyzer" : "whitespace_lowercase"
            }
          },
          "analyzer" : "skip_url_email"
        },
        "field_8" : {
          "type" : "keyword"
        },
        "field_9" : {
          "type" : "long"
        },
        "field_10" : {
          "type" : "keyword"
        },
        "field_11" : {
          "type" : "long"
        },
        "field_12" : {
          "type" : "boolean"
        },
        "field_13" : {
          "type" : "date",
          "format" : "yyyy-MM-dd'T'HH:mm:ss'Z'||yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
        },
        "field_14" : {
          "type" : "boolean"
        },
        "field_15" : {
          "type" : "float"
        },
        "field_16" : {
          "type" : "date",
          "format" : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
        },
        "field_17" : {
          "type" : "boolean"
        },
        "field_18" : {
          "type" : "boolean"
        },
        "field_19" : {
          "type" : "boolean"
        },
        "field_20" : {
          "type" : "boolean"
        },
        "field_21" : {
          "type" : "boolean"
        },
        "field_22" : {
          "type" : "date"
        },
        "field_23" : {
          "type" : "date",
          "format" : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
        },
        "field_24" : {
          "type" : "keyword"
        },
        "field_25" : {
          "type" : "long"
        },
        "field_26" : {
          "type" : "float"
        },
        "field_27" : {
          "type" : "boolean"
        },
        "field_28" : {
          "type" : "long"
        },
        "field_29" : {
          "type" : "keyword"
        },
        "field_30" : {
          "type" : "long"
        },
        "field_31" : {
          "type" : "date",
          "format" : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
        },
        "field_32" : {
          "type" : "float"
        },
        "field_33" : {
          "type" : "date",
          "format" : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
        },
        "field_34" : {
          "type" : "keyword"
        },
        "field_35" : {
          "type" : "keyword"
        },
        "field_36" : {
          "properties" : {
            "00afc0ad-dc73-4c7e-91af-c7a80f1ad2c8" : {
              "properties" : {
                "decision" : {
                  "type" : "boolean"
                },
                "score" : {
                  "type" : "float"
                }
              }
            },
            ...  # there are about 100 uuid entries here ...
            "fffa57ba-e9ff-4d4b-8f82-6e8960ba8804" : {
              "properties" : {
                "decision" : {
                  "type" : "boolean"
                },
                "score" : {
                  "type" : "float"
                }
              }
            }
          }
        },
        "field_37" : {
          "type" : "text",
          "fields" : {
            "raw" : {
              "type" : "text",
              "analyzer" : "whitespace_lowercase"
            }
          },
          "analyzer" : "skip_url_email"
        },
        "field_38" : {
          "type" : "float"
        },
        "field_39" : {
          "type" : "date",
          "format" : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
        },
        "field_40" : {
          "type" : "keyword",
          "fields" : {
            "analyzed" : {
              "type" : "text"
            }
          }
        },
        "field_41" : {
          "type" : "long"
        },
        "field_42" : {
          "type" : "boolean"
        },
        "field_43" : {
          "type" : "keyword"
        },
        "field_44" : {
          "type" : "keyword"
        },
        "field_45" : {
          "type" : "keyword",
          "fields" : {
            "analyzed" : {
              "type" : "text"
            }
          }
        }
      }
    }
  }
}

Do you have the same settings and mappings in Elasticsearch 5.2? Are you indexing new documents or also updating existing ones? Is the list of UUID properties bounded?

I do have the same setting in ES 5.2 except I'm also using the standard filter in ES 5.2 which is removed in 7. According to the doc that standard filter is just a holder and does nothing.

UUID list is bounded.

I'm updating existing ones in ES5.2, but in ES7.1 I'm loading new ones. New in term of the doc has not been created in ES7.1 so it's an indexing operating instead of updating operation, the data is same between ES5.2 and ES7.1 though...

Do both clusters have the same size and type of hardware?

Yeah they do. I actually found out the ES7.1 cluster is using 100% CPU on master node, so I bumped master node size and now ES7.1 master node is even bigger than ES5.2 master node. This helped reduce some of the rejected requests (due to full write queue) but performance is still not able to match the old one.

Are you using dynamic mappings so the newly indexed documents result in a larger number of mapping changes that have to go through the master nodes compared to the other cluster?

I don't use dynamic mappings. The mapping is pre-set before loading data into it. We did setup dedicate master nodes...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.