Can nested fields prevent mapping explosion?

(Thomas Widhalm) #1


We have a situation where we put lot's of JSON data via Logstash into Elasticsearch. The JSON differs a lot so we get many distinct fieldnames and hit the limit of 1000 fields per Index.

As far as I understand the limit was introduced to prevent mapping explosion (lots of ressources used because of large metadata) so I don't want to change limit. If I did I'm pretty sure I'd have to raise it every 3 months or so because the software which writes into Logstash tends to introduce even more fieldnames.

I already asked for hints about a solution in Using "key/value with nested fields" with logstash . But now I wonder if splitting large fieldnames into nested fields would even be of any benefit. The structure of the Fieldnames would allow for very few parent fields and lots of sub-fields.

Thanks in advance!

(Zachary Tong) #2

Unfortunately, it won't help. Nested documents all share the same "mapping space" of the index, which means you'll have the same limitations. Under the covers, nested documents are really just regular documents that adhere to some special, hidden conventions which allow them to be used with relational Nested queries.

So they contribute to the field-count for the index they reside in, just like fields on "regular" documents. Sorry :confused:

What most people do is a key/value situation like the linked article in your other thread. This makes use of nested fields, but only adds a few fields to the total mapping:

  "values" :[
    {"key": 1, "value": 1}
    {"key": 2, "value": 4}
    {"key": 3, "value": 9}
    {"key": 4, "value": 16}
    {"key": 5, "value": 25}
    {"key": 6, "value": 36}

In the above example, there are only three fields mapped ("values", "key", "value"), and the nested structure allows each key/value to be associated together. It's a lot less easy to work with then direct mapping, but pretty much the only option if you expect to have thousands of distinct fields.

(Thomas Widhalm) #3

Thank you for reply!

This is sad news. I read about this key/value approach but since I want to ingest the data with Logstash I don't see a way to achieve that, especially when the list of fields can change dynamically.

I guess I'll have to skip Logstash for this kind of logs and ask the developers to create a connector from their application directly to Elasticsearch. Or do you see any other way?

Thanks again!

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.