Logstash - Extracting substring from CSV column

Hi,

I have a csv file with a column called "threadName", value of which varies with each record in csv. example

CME_MOC 15-1
CME_MOC 15-2
CME_MOC 15-3
PME_MOC 15-1
KME_MOC 15-2

I am sending csv records to elasticsearch using below logstash conf:

csv {
separator => ","
columns => ["time", "elapsed", "threadName", "success", "IdleTime","Connect"]

But I want to extract "threadName" column and send below substring:

CME_MOC
CME_MOC
CME_MOC
PME_MOC
KME_MOC

Do I need to add a new field and use grok ? how can I achieve this

Many Thanks in advance
Ashish

You could grok or dissect.

dissect {
      mapping => { "threadName" => "%{part1} %{part}" }
}

Thanks Badger, qq- where should I place the dissect command -

filter {
if ([message] =~ "responseCode") {
drop { }
} else {
dissect {
mapping => { "threadName" => "%{part1} %{part}" }
}
csv {
separator => ","
columns => ["time", "elapsed", "label", "responseCode","responseMessage", "threadName",
"success", "bytes","sentBytes", "grpThreads", "allThreads", "Latency",
"SampleCount", "ErrorCount", "Hostname","IdleTime","Connect"]
}
}
}

The dissect{} has to come after the csv{}, otherwise the threadName field does not exist. Filters are executed in the order listed in the configuration.

Hi Badger, Thanks again. For now I am using below grok

grok {
match => [""threadName", "%{USERNAME}"]
}

I will explorer more on dissect, but could please have a quick glance and see if below line does the same thing as grok ?

dissect {
mapping => { "threadName" => "%{part1}" }
}

Thanks for your help today

I guess dissect will create a new field whereas grok keep the same field with new extracted value.

ex-
grok {
match => [""threadName", "%{USERNAME}"]
}

Here threadName field will have new value i.e CME_MOC

dissect {
mapping => { "threadName" => "%{part1}" }
}

But here CME_MOC will be stored in new field name- part1

am I right here?

Don't guess, test it :wink: Run logstash with a config like this and then type something like "CMS_MOD 15-3" into stdin.

input { stdin {} }
output { stdout { codec => rubydebug } }

filter {
 # So we can inject stuff like "PME_MOC 15-1" on stdin instead of needing a csv
 mutate { "add_field" => { "threadName" => "%{message}" } }

 # Split into 2 fields with space as separator
 dissect { mapping => { "threadName" => "%{part1} %{part2}" } }

 # No separator, so it grabs the whole thing
 dissect { mapping => { "threadName" => "%{part3}" } }

 # Match the first [a-zA-Z0-9._-]+ in the field and throw it away
 grok { match => ["threadName", "%{USERNAME}"] }

 # Match the first [a-zA-Z0-9._-]+ in the field and put it in the username field
 grok { match => ["threadName", "%{USERNAME:username}"] }

 # Match the first [a-zA-Z0-9._-]+ in the field, anchored to optimize performance 
 grok { match => ["threadName", "^%{USERNAME:username2}"] }
}

If you save that as /tmp/test.conf then you can probably run logstash using

/usr/share/logstash/bin/logstash -f /tmp/test.conf --path.settings=/etc/logstash --path.data=/tmp

:slight_smile: sure Badger, you were very helpful. Really appreciate your time and sharing the needed info.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.