Grok or CSV?

I have some data which looks like this:

John Doe;Debrah ;19,95;13-07-2016
M.C. Hammer;Susan;19;14-07-2015

The first represents a customer name, which sometimes exist of one name, two names, or names with periods (.)
The second is the name of the sales, which sometimes does have a space after the name, see Debrah.
The third is a sales price which sometimes has 2 digits after the comma but sometimes no decimals at all.
The fourth is a date with the format dd-mm-yyyy

I tried to use CSV filter but this does not give me the 2 decimal numbers (95), but 00 in stead, so 19,00

csv {
columns => ["customer";"sales";"price","date"]
separator => ";"
}
mutate {
convert => { "prices" => "float" }
}
date {
match => [ "date", "dd-MM-YYYY", "dd-MM-YYYY HH:mm:ss", "ISO8601" ]
target => "@timestamp"
add_field => { "debug" => "timestampMatched"}
}

I tried to use Grok, and this works better but then I do not now how to match my date with the @timestamp.

(?\b[\w ]+);(?\b[\w ]+);(?\b(?:[1-9][0-9]*,[0-9]+)\b);%{DATE_EU:date}

So my questions are:

  1. Which one is best to be used in this scenario?
  2. If using CSV, how can I get my correct decimals? So, 19,95 in stead of 19,00
  3. If using Grok, how can I match my date with the @timestamp?

This worked for me;

$ sudo /usr/share/logstash/bin/logstash -e 'input{stdin{}} filter{csv {
columns => ["customer","sales","price","date"]
separator => ";"
}}
output{stdout{codec=>rubydebug}}'
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs to console
The stdin plugin is now waiting for input:
02:34:24.229 [[main]-pipeline-manager] INFO  logstash.pipeline - Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
02:34:24.259 [[main]-pipeline-manager] INFO  logstash.pipeline - Pipeline main started
02:34:24.461 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {:port=>9600}
John Doe;Debrah ;19,95;13-07-2016
{
          "date" => "13-07-2016",
    "@timestamp" => 2016-12-13T02:34:43.191Z,
         "price" => "19,95",
      "@version" => "1",
          "host" => "elastic5",
       "message" => "John Doe;Debrah ;19,95;13-07-2016",
         "sales" => "Debrah ",
          "tags" => [],
      "customer" => "John Doe"
}
^C02:34:46.963 [SIGINT handler] WARN  logstash.runner - SIGINT received. Shutting down the agent.
02:34:46.981 [LogStash::Runner] WARN  logstash.agent - stopping pipeline {:id=>"main"}
1 Like

Hi Mark,

Thanks for your answer/reply!
This works fine. But when using Kibana, it shows my prices as 19,00 instead of 19,95. I think it has probably to do with the fact that we in the Netherlands are using a comma as a decimal sign, this instead of a period.
So, I guess that I have to tell Kibana somehow that my locale=nl.

In my CSV file I replaced the comma's (,) with periods (.) and this solves the problem, but I do not want to alter my source file just to work around this problem.

Do you know how this works?

Best regards,

Mario

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.