Grok or CSV?


(Mario van Gemert) #1

I have some data which looks like this:

John Doe;Debrah ;19,95;13-07-2016
M.C. Hammer;Susan;19;14-07-2015

The first represents a customer name, which sometimes exist of one name, two names, or names with periods (.)
The second is the name of the sales, which sometimes does have a space after the name, see Debrah.
The third is a sales price which sometimes has 2 digits after the comma but sometimes no decimals at all.
The fourth is a date with the format dd-mm-yyyy

I tried to use CSV filter but this does not give me the 2 decimal numbers (95), but 00 in stead, so 19,00

csv {
columns => ["customer";"sales";"price","date"]
separator => ";"
}
mutate {
convert => { "prices" => "float" }
}
date {
match => [ "date", "dd-MM-YYYY", "dd-MM-YYYY HH:mm:ss", "ISO8601" ]
target => "@timestamp"
add_field => { "debug" => "timestampMatched"}
}

I tried to use Grok, and this works better but then I do not now how to match my date with the @timestamp.

(?\b[\w ]+);(?\b[\w ]+);(?\b(?:[1-9][0-9]*,[0-9]+)\b);%{DATE_EU:date}

So my questions are:

  1. Which one is best to be used in this scenario?
  2. If using CSV, how can I get my correct decimals? So, 19,95 in stead of 19,00
  3. If using Grok, how can I match my date with the @timestamp?

(Mark Walkom) #2

This worked for me;

$ sudo /usr/share/logstash/bin/logstash -e 'input{stdin{}} filter{csv {
columns => ["customer","sales","price","date"]
separator => ";"
}}
output{stdout{codec=>rubydebug}}'
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs to console
The stdin plugin is now waiting for input:
02:34:24.229 [[main]-pipeline-manager] INFO  logstash.pipeline - Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
02:34:24.259 [[main]-pipeline-manager] INFO  logstash.pipeline - Pipeline main started
02:34:24.461 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {:port=>9600}
John Doe;Debrah ;19,95;13-07-2016
{
          "date" => "13-07-2016",
    "@timestamp" => 2016-12-13T02:34:43.191Z,
         "price" => "19,95",
      "@version" => "1",
          "host" => "elastic5",
       "message" => "John Doe;Debrah ;19,95;13-07-2016",
         "sales" => "Debrah ",
          "tags" => [],
      "customer" => "John Doe"
}
^C02:34:46.963 [SIGINT handler] WARN  logstash.runner - SIGINT received. Shutting down the agent.
02:34:46.981 [LogStash::Runner] WARN  logstash.agent - stopping pipeline {:id=>"main"}

(Mario van Gemert) #3

Hi Mark,

Thanks for your answer/reply!
This works fine. But when using Kibana, it shows my prices as 19,00 instead of 19,95. I think it has probably to do with the fact that we in the Netherlands are using a comma as a decimal sign, this instead of a period.
So, I guess that I have to tell Kibana somehow that my locale=nl.

In my CSV file I replaced the comma's (,) with periods (.) and this solves the problem, but I do not want to alter my source file just to work around this problem.

Do you know how this works?

Best regards,

Mario


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.