Field spilt

galtryn · July 10, 2019, 8:40am

Hi Guys,

I am ingesting some IoT sensor data into Elasticsearch via logstash, the payload of the message is a HEX string which currently looks something like this;

129, 91, 0, 0, 28, 28, 29, 0, 4, 0, 0, 0, 11, 38, 0, 238

This is currently all in one field when it arrives in Elasticsearch, I would like to this field on the , into separate fields. Does anyone out there have experience of doing this?

Thanks in advance
Nick

sjabiulla · July 10, 2019, 11:20am

You can parse this using csv filter

csv{
	columns => [field1,field2,field3.............]
}

You can mention some meaningful fieldname's as you like.

galtryn · July 10, 2019, 12:50pm

Hi sjabiulla,

I have tried your suggestion of using the CSV filter but I get the below error,

Error parsing csv {:field=>"hex", :source=>[129, 75, 10, 0, 24, 24, 24, 0, 4, 0, 0, 0, 60, 54, 0, 214], :exception=>#<NoMethodError: private method `gets' called for [129, 75, 10, 0, 24, 24, 24, 0, 4, 0, 0, 0, 60, 54, 0, 214]:Array>}

Any ideas?

sjabiulla · July 10, 2019, 12:56pm

can you please provide your config and a sample input

galtryn · July 10, 2019, 1:01pm

My current config is

filter {
 csv {
    source => "hex"
    columns => [field1,field2]
 }
}

the input would be

129, 75, 10, 0, 24, 24, 24, 0, 4, 0, 0, 0, 60, 54, 0, 214

sjabiulla · July 10, 2019, 1:08pm

remove source from csv filter.

filter {
    csv {
        columns => [field1,field2]
    }
}

I have tested the below config using the sample input you provided and got all fields splitted properly using csv filter.

input {
    stdin {
    }
}

filter {
    csv {
        columns => [field1,field2]
    }
}

output {
    stdout {
    codec => rubydebug {}
    }
}

Output:

{
        "field1" => "129",
      "column12" => " 0",
          "host" => "01hw1344293",
      "@version" => "1",
       "column8" => " 0",
      "column16" => " 214",
       "column5" => " 24",
      "column11" => " 0",
      "column14" => " 54",
    "@timestamp" => 2019-07-10T13:06:16.496Z,
       "column4" => " 0",
      "column10" => " 0",
        "field2" => " 75",
       "column3" => " 10",
      "column13" => " 60",
      "column15" => " 0",
       "column9" => " 4",
       "message" => "129, 75, 10, 0, 24, 24, 24, 0, 4, 0, 0, 0, 60, 54, 0, 214\r",
       "column6" => " 24",
       "column7" => " 24"
}

galtryn · July 10, 2019, 1:11pm

I really need to the CSV data to come from the hex field as that is the field it arrives in Elastic search in, is it not possible to do it that way?

sjabiulla · July 10, 2019, 1:14pm

That's the reason I have asked you for the config and sample input. If you provide me the wrong config and wrong sample input then of-course my answers won't help you.

galtryn · July 10, 2019, 1:32pm

Here is the full pipeline

input {
  http {
   host => "0.0.0.0"
   port => "8080"
  }
}

filter {
 json {
  source => "message"
  target => "json_log"
}
if [message] =~ /^\s*$/ {
drop { }
 }
}
filter {
ruby {
init => "require 'base64'"
code => "event.set( '[payload]', Base64.decode64(event.get('[json_log][data]')).unpack('C*'))"
 }
}
 filter {
 mutate {
   copy => { "payload" => "hex" }
 }
}
filter {
csv {
    source => "hex"
    columns => [field1,field2]
 }
}
output

The hex values come in encrypted thats why have to decrypt and end up with the hex values

Badger · July 10, 2019, 1:49pm

You can convert the array of integers to a string that csv will parse using

    ruby { code => 'event.set("payload", event.get("payload").to_s[1..-2])' }

The [1..-2] is to strip off the [ and ] that to_s will add.

galtryn · July 11, 2019, 8:27am

Thank you badger I shall try it this morning

galtryn · July 11, 2019, 8:43am

I added the ruby code to the pipeline and got the following error
Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"############-2019.07.11", :_type=>"_doc", :routing=>nil}, #LogStash::Event:0x1f4d7ec7], :response=>{"index"=>{"_index"=>"##########-2019.07.11", "_type"=>"_doc", "_id"=>"qbIq4GsB_o88aajI8mVC", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [payload] of type [long] in document with id 'qbIq4GsB_o88aajI8mVC'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"For input string: "129, 99, 15, 0, 21, 20, 21, 0, 4, 0, 0, 0, 12, 59, 0, 202""}}}}}

Not sure where to go from here

Badger · July 11, 2019, 1:24pm

I think that is telling you that payload is a string, but elasticsearch has a mapping that tells it to expect a long in that field.

Renaming the field with mutate might fix it.

system · August 8, 2019, 1:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Let CSV columns be fields Logstash	5	611	July 30, 2019
Split csv column to several fields Logstash	16	5991	July 6, 2017
Split a csv fied into nested array Logstash	3	862	September 8, 2018
How to use split filter on the field using logstash Logstash	5	6199	March 27, 2019
Array field in event Logstash	6	2701	January 6, 2017

Field spilt

Related topics