Saving data to ES 5.4 blocked when ES is creating mapping and updating mapping

My workflow is:

  1. Reading data from kafka
  2. Saving data into Elastic Search 5.4

I checked the time spent at every method in my application, the following picture shows normal workflow: most time is spent at reading data from kafka


The following picture shows abnormal workflow: all the time is spent at saving data into ES 5.4, apparently it means ES failed to response.
image
and I checked the ES log, I found ES was creating mapping and updating mapping when application failed to save data to ES.
image

Updating a mapping will require the cluster state to be updated and then propagated, which will affect throughput. While this goes on you can not index into that index. As the cluster state grows this may get slower and slower. You seem to have a very large number of dynamically mapped fields, which could become problematic over time.

Thank you for answering my question. So how could I fix this promblem? Now my default template is as following:
{
"defaulttemplate" : {
"order" : 0,
"template" : "*",
"settings" : {
"index" : {
"translog" : {
"flush_threshold_size" : "1024mb",
"sync_interval" : "10s",
"durability" : "async"
},
"number_of_replicas" : "0",
"refresh_interval" : "10s"
}
},
"mappings" : {
"default" : {
"_source" : {
"enabled" : "false"
},
"_all" : {
"enabled" : "false"
}
}
},
"aliases" : { }
}
}

One option would be to provide the mappings you expect through an index template ahead of time. Updating mappings one-by-one as they come along is inefficient.

How many fields do you have in your mappings?

Hi Christian,

Let me breifly talk about our problem.

We are upgrading from ES 1.X to ES 5.4 cluster. It is true that we got a lot of dynamic mappings, and we didn't run into this problem when we were using ES 1.X which is a single node but not a cluster. However, as far as I know, the speed of creating and updating mappings is mostly affected by the number of shards. We set the number of shards as 5 in the two circumstances. Will the number of nodes slow down the speed significantly?

By the way, we got about 20 fields.

The mappings are kept in the cluster state, so I don't think the number of shards has an impact. The updated mappings do however need to be spread across the nodes in the cluster, so increased number of nodes will take longer. From Elasticsearch 2.x, deltas are sent, which is more efficient that the method used in 1.x.

Based on the screenshot it looks like you have more than 20 fields. Can you retrieve the mapping for that index and check?

{
"2018_04_21" : {
"mappings" : {
"1_40179" : {
"_all" : {
"enabled" : false
},
"_source" : {
"enabled" : false
},
"properties" : {
"channel" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"event_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"is_new" : {
"type" : "long"
},
"is_new_player" : {
"type" : "long"
},
"os_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"param_type" : {
"type" : "long"
},
"room_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"server" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"timestamp" : {
"type" : "long"
},
"user_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"value" : {
"type" : "float"
}
}
}
}
}
}

Above is one of our mappings, I know the text type would slow down the speed, so I am considering use the below template to make some improvement.

"dynamic_templates": [
{
"string_to_keyword": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
]
}

Can you give us any suggestions about the new template?
Thanks in advance!

What does the mappings for the 2018_04_26 index (seen in screenshot) look like? Do you by any chance have a lot of types in your indices?

Yep, we do have a lot of types in one index and all of them share the same mapping structure.

Then I guess that is the problem as each new type also requires the mappings to be updated. You should move away from using multiple types as types are being removed. In version 6.x it is already not possible to create new indices with more than one type, and the goal is to over time remove the concept of type completely. You can instead create a new field called type and store the type there, which will allow you to filter on it.

Once you have done this, you can add your fields with the correct mapping to an index template, and that should remove the issues you have been seeing.

Thanks for your patience! We will try it later.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.