Logstash - Extracting substring from CSV column


(Ashish) #1

Hi,

I have a csv file with a column called "threadName", value of which varies with each record in csv. example

CME_MOC 15-1
CME_MOC 15-2
CME_MOC 15-3
PME_MOC 15-1
KME_MOC 15-2

I am sending csv records to elasticsearch using below logstash conf:

csv {
separator => ","
columns => ["time", "elapsed", "threadName", "success", "IdleTime","Connect"]

But I want to extract "threadName" column and send below substring:

CME_MOC
CME_MOC
CME_MOC
PME_MOC
KME_MOC

Do I need to add a new field and use grok ? how can I achieve this

Many Thanks in advance
Ashish


#2

You could grok or dissect.

dissect {
      mapping => { "threadName" => "%{part1} %{part}" }
}

(Ashish) #3

Thanks Badger, qq- where should I place the dissect command -

filter {
if ([message] =~ "responseCode") {
drop { }
} else {
dissect {
mapping => { "threadName" => "%{part1} %{part}" }
}
csv {
separator => ","
columns => ["time", "elapsed", "label", "responseCode","responseMessage", "threadName",
"success", "bytes","sentBytes", "grpThreads", "allThreads", "Latency",
"SampleCount", "ErrorCount", "Hostname","IdleTime","Connect"]
}
}
}


#4

The dissect{} has to come after the csv{}, otherwise the threadName field does not exist. Filters are executed in the order listed in the configuration.


(Ashish) #5

Hi Badger, Thanks again. For now I am using below grok

grok {
match => [""threadName", "%{USERNAME}"]
}

I will explorer more on dissect, but could please have a quick glance and see if below line does the same thing as grok ?

dissect {
mapping => { "threadName" => "%{part1}" }
}

Thanks for your help today


(Ashish) #6

I guess dissect will create a new field whereas grok keep the same field with new extracted value.

ex-
grok {
match => [""threadName", "%{USERNAME}"]
}

Here threadName field will have new value i.e CME_MOC

dissect {
mapping => { "threadName" => "%{part1}" }
}

But here CME_MOC will be stored in new field name- part1

am I right here?


#7

Don't guess, test it :wink: Run logstash with a config like this and then type something like "CMS_MOD 15-3" into stdin.

input { stdin {} }
output { stdout { codec => rubydebug } }

filter {
 # So we can inject stuff like "PME_MOC 15-1" on stdin instead of needing a csv
 mutate { "add_field" => { "threadName" => "%{message}" } }

 # Split into 2 fields with space as separator
 dissect { mapping => { "threadName" => "%{part1} %{part2}" } }

 # No separator, so it grabs the whole thing
 dissect { mapping => { "threadName" => "%{part3}" } }

 # Match the first [a-zA-Z0-9._-]+ in the field and throw it away
 grok { match => ["threadName", "%{USERNAME}"] }

 # Match the first [a-zA-Z0-9._-]+ in the field and put it in the username field
 grok { match => ["threadName", "%{USERNAME:username}"] }

 # Match the first [a-zA-Z0-9._-]+ in the field, anchored to optimize performance 
 grok { match => ["threadName", "^%{USERNAME:username2}"] }
}

If you save that as /tmp/test.conf then you can probably run logstash using

/usr/share/logstash/bin/logstash -f /tmp/test.conf --path.settings=/etc/logstash --path.data=/tmp


(Ashish) #8

:slight_smile: sure Badger, you were very helpful. Really appreciate your time and sharing the needed info.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.