Best way to insert denormalized data to ES

Hi ElasticTeam,

I want to get your expert opinion on what's the most optimal way to solve problem mentioned below:
Scenario: App receives data then does some processing writes it to Kafka Topic, which is picked up by Kafka Connector and sent to ElasticSearch via SinkConnector and is then displayed to user on Kibana.

Processed Data is of type array of objects, example:

{
    "Host": <string>,
    "switch": [{
        "SwitchID": <integer>,
        "portStats": [{
            "PortID": <integer>,
            "TxPackets":  <integer>,
            "TxError": <integer>,
           ....
        }]
    }]
}

Initially I started with nested type (also played with flattened type) mapping for above data in ElasticSearch but when I was playing around with it in Kibana, then figured kibana doesn't support nested datatype ( even flattened/object type don't work that well for incoming array of objects data) that well. So now I am thinking of storing the data in denormalized form in ElasticSearch so I can support different visualization of that data in Kibana.

So the new mapping template in ES based on de-normalization would look like:

{
    "template": "stats-*",
    "order": 0,
    "settings": {
        "index": {
            "number_of_shards": 3,
            "number_of_replicas": 2,
            "default_pipeline" : "addtimestamp"
        },
    },
    "mappings": {
        "properties": {
            "IngestTimestamp": {"type": "date"},
            "host": {"type": "keyword"},
            "SwitchID": {"type": "integer"},
            "PortID": {"type": "integer"},
            "TxPackets": {"type": "long"},
            "TxErrors": {"type": "long"},
        }
    }
}

So app's output will be like:

[
{"Host": "Foo", "SwitchID":1, "TxPackets":65, "TxErrors":10},
{"Host": "Foo", "SwitchID":0, "TxPackets":165, "TxErrors":30}
]

If app was directly writing to ES (ElasticSearch) then it could have used _bulk insert API but this data from app is written to Kafka then picked up by Kafka-ES_SinkConnector and sent to ES. And based on the mapping above the write will fail as ES template mapping is not for array of objects. Obviously in app I can write value to Kafka one object at a time but that would not be the most ideal way as way too many network calls (i.e each input to app now converts into '20X' calls as I have atleast 20 objects in array).

I want to do this in most optimal way. Is it possible to receive the data in ES in nested way but insert into doc in denormalized way maybe some transformation or something as part of Pipeline (I am using simple pipeline to add timestamp to each record)? If yes then can someone give an example:

FYI here is the mapping for nested data type I was using:

"mappings": {
        "properties": {
            "IngestTimestamp": {"type": "date"},
            "host": {"type": "keyword"},
            "switch": {
                "type": "nested",
                "properties": {
                     "SwitchID": {"type": "integer"},
                     "portStats": {
                         "type": "nested",
                         "properties": {
                              "PortID": <integer>,
                              "TxPackets":  <integer>,
                              "TxError": <integer>,
                         }
                    }
                }
            }
        }
    }

Appreciate for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.