Split csv column to several fields


#1

i use logstash to import from csv to elastic. the problem is that one of the fields contains the string formatted as "int|string|string" and i need to parse this and put into nested field. what are possible solutions for this?
Also this field is array


(Magnus Bäck) #2

Sounds like the mutate filter's split option would be a good fit.


#3

the first problem is that it is array of that strings divided by ", ". the second one is that array should be added as nested objects in elastic and the parts of that string are fields of nested object. i have no idea how i can use split for this.


(Magnus Bäck) #4

Sorry, I don't understand what you're talking about. Describing the problem with examples is usually better than using words. Show us what event you have and what you'd like to achieve.


#5

this column in csv contains the folowing: 1|http://blahblahblah.com|blahblahblah,23|http://anotheraddres.com/|anothertext


(Magnus Bäck) #6

Yes? And what's the expected result?


#7

in es document there is nested field where i want to insert these values, like this:
links
properties:[
{ "link.id":"1",
"link.url":"http://blahblahblah.com",
"link.name":"blahblahblah"
},
{ "link.id":"23",
"link.url":"http://anotheraddress.com/",
"link.name":"anothertext"
}]


(Magnus Bäck) #8

So you want to update a document that already exists and add new JSON objects to an array? That's... not so easy.


#9

No i want to create new document


(Magnus Bäck) #10

Right, but you want each line of the CSV file to build upon each other and in the end result in a single document containing all URL/name pairs found in the CSV file?

Because an input file isn't necessarily read in one pass and potentially could be very large you'd typically implement this by having it create an initial document and update it for each line of the input file.


#11

one line =>one doc


(Magnus Bäck) #12

Okay, now it's all clear. I think you'll have to involve a ruby filter. Untested but should be fairly close:

ruby {
  code => "
    event['links'] = event['message'].split('\\,').collect { |t|
      c = t.split '|'
      {
        'link_id' =>  c[0],
        'link_url' => c[1],
        'link_name' => c[2]
      }
    }
  "
}

Note that Elasticsearch no longer allows periods in field names so I replaced them with underscores. If you want each element of the array to contain a single object with three keys (id, url, name) that's of course doable too.


#13

thank u very much i will try it later.


#14

but will it create multiple links for single document?


(Magnus Bäck) #15

but will it create multiple links for single document?

Yes. I realized that I missed one crucial part earlier. I've updated my previous post so that the list of JSON objects is stored in the links field.


#16

thanks a lot. It works !


(system) #17