Performance of Elasticsearch for number of documents vs number of properties in a document

firabby · August 5, 2021, 6:15am

I want to store timeseries data in elasticsearch. But I am not sure how I should model the document.

Let's say, the document looks like this.

{
"timestamp": "iso 8601 date time",
"common_property1": "value",
"common_property2": "value",
"common_property3": "value",
"common_property4": "value",
"common_property5": "value",
  "indivisual_properties": {
    "proeprty1": "value",
    "property2": {
      "nesting_element": "value"
    },
    "property3": {
      "nesting_element1": "value",
      "nesting_element2": "value",
      "nesting_element3": "value",
      "nesting_element4": "value",
      "nesting_element5": "value",
      ....this can go upto 70-80 
    },
    "property4": {
      "nesting_element": "value"
    },
    "property5": {
      "nesting_element": "value"
    },
    "property6": {
      "nesting_element": "value"
    },
    "proeprty7": "value",
    "proeprty8": "value",
    "proeprty9": "value",
    "proeprty10": "value",
...this can go upto 80-90 properties
  }
}

Now what we are doing is we are storing individual_property with nesting element as string in the main document and we are also creating a separate index for the individual properties where we want to make a query using nesting_element.

Now I am thinking of remodeling the document as following:

{
"timestamp": "iso 8601 date time",
"common_property1": "value",
"common_property2": "value",
"common_property3": "value",
"common_property4": "value",
"common_property5": "value",
  "indivisual_properties": [
    {
        "name": "value",
        "value": "value"
    },
    {
        "name": "value",
        "value": "value"
    },
    {
        "name": "value",
        "value": "value",
        "nesting_element": "value"
    },
    {
        "name": "value",
        "value": "value"
    },
    {
        "name": "value",
        "value": "value",
        "nesting_element": "value"
    },
    {
        "name": "value",
        "value": "value"
    }
........... and probably 100-150 more properties
  ]
}

and thinking of using nested field type and nested field mapping so that we can query the individual properties based on name and nesting element.

But I am aware that the later model will require Lucene to index hundred more documents ( number of nested documents + 1 to be exact ).

In this scenario, which data model seems to be more performant/effecient? How will they impact indexing and search performance? Is there an alternative way to make it more efficient?

dadoonet · August 5, 2021, 7:34am

Why this? I don't understand? Do you have a concrete example?

If the number of total fields is reasonable enough (under 1000), I'd probably go for the former model.

I'd use the later only if this amount of fields is subject to grow or if it's a super flexible schema (like field names depending on user inputs).
I could also change my mind depending on the first question I asked (and its answer)

firabby · August 5, 2021, 12:18pm

We are storing device metrics collected using SNMP. We have a fixed set of metric name, but for each vendor and model the OID to collect these metric are different. And each type of device has different sets of metrics that are actually meaningful. So, we started collecting all the metrics we thought it was important, such as - how many clients are connected to a router etc. but that made the number of properties of a document more than 1000. Because there are OIDs that return values with index. Such as ifInoctet returns value for all available interfaces (now imagine collecting interface stat for a switch). Anyway, to solve this problem our senior dev suggested that we should store these type of metrics as a json string so that the document does not have a child document. And on a priority basis, we will create separate index for each metric group. Such as, separate index for interface stats with documents like this:

{
      "device": "192.168.68.1",
      "@timestamp": "2021-07-30T11:02:08.722449",
      "collector_id": "170c15c2-a664-49da-a5d5-4a3fd7ba82ef",
      "key": "20",
      "adminstatus": 1,
      "desc": "",
      "type": 135,
      "operstatus": 1,
      "list": 20,
      "speed": 0
}

and so on

Now, user can also add custom property using an OID. Now, not all OIDs return a scalar value, some returns data with an additional index. Say, you are polling an OID 1.2.3.4.5.6 and it returns data like:

1.2.3.4.5.6.0 = 6746
1.2.3.4.5.6.1 = 45874
1.2.3.4.5.6.2 = 47467
1.2.3.4.5.6.3 = 3435
1.2.3.4.5.6.4 = 79874

and say, user has named OID 1.2.3.4.5.6 as dummy. So, in elasticsearch we are storing it as

{
.....
"dummy_0": 6746,
"dummy_1":45874
.....
}

Now, this approach does not look scalable to me and we are going to face that 1000 property limit sooner or later as user is now capable of adding custom metric (as a result, property name are now dependent on user input). So, I am looking for a solution that is scalable.

firabby · August 6, 2021, 3:55pm

Sir, I hope I could clear up some of your confusion. Please let me know if you need more information.

system · September 3, 2021, 3:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best approach between increasing index properties size or large no of nested document Elasticsearch	2	418	July 6, 2017
Index nested documents separately? Elasticsearch	4	1211	July 6, 2017
Build index type with hundreds properties or many separate types? Elasticsearch	2	624	December 21, 2016
Nested document indexing performance [How to improve] Elasticsearch	2	574	July 6, 2017
Advice on the memory consumption Elasticsearch	3	345	July 6, 2017

Performance of Elasticsearch for number of documents vs number of properties in a document

Related topics