How to create a nested array from multiple input fields

ehausig · October 4, 2016, 3:03pm

Hello,

I'm having difficulty figuring out how to fill a nested data type with values from multiple fields from a CSV. For example:

I have a CSV data source that has the following columns:

company_id
company_name
company_alias1
company_alias_description1
company_alias2
company_alias_description2
company_alias3
company_alias_description3
company_alias4
company_alias_description4
company_alias5
company_alias_description5
latitude
longitude

I created an index mapping that looks like this:

curl -XPUT 'localhost:9200/companies_v1/_mapping/company?pretty' -d '
{
    "company": {
        "properties": {
            "company_id":          { "type": "string" },
            "company_name":   { "type": "string" },
            "company_alias1":   { "type": "string" },
            "company_alias2":   { "type": "string" },
            "company_alias3":   { "type": "string" },
            "company_alias4":   { "type": "string" },
            "company_alias5":   { "type": "string" },
            "geo_coordinates":  { "type": "geo_point"},
            "company_aliases":  {
                                    "type": "nested",
                                    "properties": {
                                        "name": { "type": "string" },
                                        "description": { "type": "string" }
                                    }

            }
        }
    }
}
'

The "company_alias*" fields are optionally, but when they do have values they will always be in sequence. In other words, if the company_alias3 field has a value then I can assume that company_alias2 and company_alias1 also have values. I would like the enumerated "company_alias#" and "company_alias_description#" fields to be paired, so I'd like use the nested data type to do this. (I hope I explained that correctly.)

Here is the code I have thus far:

input {
      file {
          path => "/data/companies.txt"
          type => "company"
          start_position => "beginning"
          ignore_older => 0
      }
}
filter {
    csv {
        columns => [
            "company_id",
            "company_name",
            "company_alias1",
            "company_alias_description1",
            "company_alias2",
            "company_alias_description2",
            "company_alias3",
            "company_alias_description3",
            "company_alias4",
            "company_alias_description4",
            "company_alias5",
            "company_alias_description5",
            "latitude",
            "longitude"
            ]
        separator => "|"
        skip_empty_columns => true
    }

    if([company_alias1]) {

        # <<<<<<<<<<<<<<<<<<<<<<<< HELP!

    }

    if([longitude] and [latitude]) {
        mutate {
            convert => {
                "latitude" => "float"
                "longitude" => "float"
            }
            add_field => { "[geo_coordinates][lat]" => "%{latitude}" }
            add_field => { "[geo_coordinates][lon]" => "%{longitude}" }
        }
    }

}
output {
    elasticsearch {
        action => "index"
        hosts => "localhost:9200"
        index => "companies_v1"
        workers => 1
        document_id => "%{company_id}"
    }
    #stdout {
    #    codec => rubydebug
    #}
}

Any insights would be greatly appreciated. I'm new to Logstash and Elasticsearch.

Thank you!

ehausig · October 5, 2016, 1:50pm

I think I came up with a way to handle it using ruby, but I'm still curious if there are other ways to accomplish the same thing in the config file.

Here is a little code block that creates the company_aliases array and inserts a JSON object into it:

...

if([company_alias1]) {

    ruby {
        code => "
            event['company_aliases'] = Array.new;
            e = {:name => event['company_alias1'], :description => event['company_description1'] };
            require 'json';
            e.to_json;
            event['company_aliases'].push(e);
        "
    }

    if([company_alias2]) {
...