How to split into multiple events dynamically for a given json? Tried from various question in forums

OK, so if you have a piece of JSON that contains 80,000 records, each of which has 15 columns, then it might be 10 MB. And you are going to create 80,000 copies of that, which involves allocate 800 GB of memory (or possibly 1.6 TB if you have a copy of the message in addition to the parsed data). Oh, and this is Java, so I guess every char is two bytes, so perhaps 3.2 TB of memory to be allocated and GC'd. I'm not surprised it takes a long time.

Replacing the json codec with

json {
    source => "message"
    target => "data"
    remove_field => [ "message" ]
}

may avoid a second copy of the 10 MB on each event (it it is present -- I'm not sure what you get from an http_poller with a codec).

Inside the split filter, it is cloning the event here, which creates a new 10 or 20 MB object, and the next line replaces that with something that only takes a couple of hundred bytes. That's a really expensive way of doing it. We need something more like the UNIX system call vfork, if you are familiar with that.

Instead of cloning the event, just set event_split to an empty new event and copy over the fields you need like timestamp, host, version, etc. (still doing event_split.set(@target, value) etc.). Then yield that.

However, the details of replacing the clone with an empty new event are beyond me. You need someone who understand a little more about events than I do.

1 Like