How to Upsert Nested Obj using grok and Logstash

abidul_wahab_ramadan · March 19, 2017, 8:05am

So currently i'm analysing data from my MYSQL subtitles db, and putting them in ElasticSearch 5.2. Regardless, my ES logstash has the following filter:

filter {
    grok {
           match => ["subtitles", "%{TIME:[_subtitles][start]} --> %{TIME:[_subtitles][end]}%{GREEDYDATA:[_subtitles][sentence]}" ]
          }
}

which is producing the following :

"_subtitles": {
                  "sentence": [
                     "im drinking latte",
                     "im drinking coffee",
                     "while eating a missisipi cake"
                  ],
                  "start": [
                     "00:00:00.934",
                     "00:00:01.934",
                     "00:00:04.902"
                  ],
                  "end": [
                     "00:00:02.902",
                     "00:00:03.902",
                     "00:00:05.839"
                  ]
               }

But what i want is this :

 "_subtitles": [
                     {
                          "sentence": "im drinking latte",
                          "start": "00:00:00.934",
                          "end": "00:00:02.902"
                       },
                     {... same structure as above},
                     {... same structure as above},
]

Having in mind that _subtitles will be nested by predefined mapping if needed.

And the original data is as follow:

00:00:00.934 --> 00:00:02.902
im drinking latte

00:00:01.934 --> 00:00:03.902
im drinking coffee

00:00:04.902 --> 00:00:05.839
while eating a missisipi cake

how can i achieve that [Nested obj], using grok's match pattern and placeholders ?

magnusbaeck · March 21, 2017, 7:21am

I'm pretty sure you can't. You'll have to postprocess the field with a ruby filter.

abidul_wahab_ramadan · March 22, 2017, 1:52pm

Yeah actually i ended up resolving the issue in that way.
but now the issue is that its over processing anyways i can enhance the one below.

abidul_wahab_ramadan · March 22, 2017, 1:58pm

So after a lot of research and reading i found THE ANSWER

I found the best way to do it is either :

Leave logstash and do my own script for migrating from mysql to Elastic, but then i'd have to do all the pattern recognition and replacement, which can get somehow complicated.
post-process the fields with a Ruby script/filter.

The solution was as follow:

ruby {
      code => "
        subtitles = []
        starts = event.get('start')
        ends = event.get('end')
        sentences = event.get('sentence')
        counter = 0
        starts.each do |v|
         temp_hash = {}
         temp_hash['index'] = counter
         temp_hash['start'] = v
         temp_hash['end'] = ends[counter]
         temp_hash['sentence'] = sentences[counter]
         counter += 1
         subtitles.push(temp_hash)
        end
        event.set('subtitles', subtitles)
      "
  }

Hope that helps.
But now i'm trying to improve this, because my ElasticSearch container fails with something like "cannot handle requests"/ goes off for a while.. just because of the indexing (currently around 20k row from mysql) into Elastic with around 40 nested objects for each.
Anything that i can do to make faster?
maybe a way to flag docs so i dont process them and mark them as processed the previous day or some'n ?

Thanks,
Regards.

system · April 19, 2017, 1:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Append an object (nested) from Logstash to Elasticsearch Logstash	3	2954	July 6, 2017
Creating/updating array of objects in elasticsearch logstash output Logstash	3	4224	July 6, 2017
Update Nested object with logstash Logstash	1	1880	February 21, 2017
Wrap logstash information into another object Logstash	1	287	January 17, 2019
Migrate mysql nth level data to elasticsearch via logstash (config file) Logstash	2	380	November 7, 2019

How to Upsert Nested Obj using grok and Logstash

how can i achieve that [Nested obj], using grok's match pattern and placeholders ?

So after a lot of research and reading i found THE ANSWER

Related topics