Logstash-elastic do not support array based columns?


(Ritesh) #1

Hi All

I have a json file like this

{
    "name":"John",
    "age":30,
    "cars": [
        { "name":"Ford", "models":[ "Fiesta", "Focus", "Mustang" ] },
        { "name":"BMW", "models":[ "320", "X3", "X5" ] },
        { "name":"Fiat", "models":[ "500", "Panda" ] }
    ]
 }

After using ruby filterof logstash i get fields in this fashion. For array i get numbers. How can i avoid that keeping records intact

message.name : John
message.age  : 30
message.cars.0.name : Ford
message.cars.1.name : BMW
message.cars.2.name : Fiat

How can i convert this to 3 records instead of 3 Fields, like below, so that i can capture it as 3 records in one field

message.cars.name : Ford
message.cars.name : BMW
message.cars.name : Fiat

(Magnus Bäck) #2

Exactly where are you seeing the array indexes? Copy/paste from the source.


(Ritesh) #3

Hi

I get this in Elastic search

{:name=>"John", :age=>30, :cars=>{"0"=>{:name=>"Ford", :models=>{"0"=>"Fiesta", "1"=>"Focus", "2"=>"Mustang"}}, "1"=>{:name=>"BMW", :models=>{"0"=>"320", "1"=>"X3", "2"=>"X5"}}, "2"=>{:name=>"Fiat", :models=>{"0"=>"500", "1"=>"Panda"}}}}

and in Kibana i get this

message.name : John
message.age  : 30
message.cars.0.name : Ford
message.cars.1.name : BMW
message.cars.2.name : Fiat

My expectation is this in Kibana, just 3 records

message.cars.name : Ford
message.cars.name : BMW
message.cars.name : Fiat

(Magnus Bäck) #4

{:name=>"John", :age=>30, :cars=>{"0"=>{:name=>"Ford", :models=>{"0"=>"Fiesta", "1"=>"Focus", "2"=>"Mustang"}}, "1"=>{:name=>"BMW", :models=>{"0"=>"320", "1"=>"X3", "2"=>"X5"}}, "2"=>{:name=>"Fiat", :models=>{"0"=>"500", "1"=>"Panda"}}}}

That looks like something from Ruby so I don't understand exactly where it's coming from.

What do your Logstash filters look like? What's does a stdout { codec => rubydebug } output produce?


(Ritesh) #5

Hi Thanks for your reply.
Here is my complete conf file

input {  
      file {
	  
          sincedb_path =>  "..../sincedb_path.txt"
          path => "..../logstash-5.5.2/bin/test1.json"
          type => "test"
          start_position => "beginning"
		  
      }
}
    
filter {
  json {
    source => "message"
	target => "docs"
  }

ruby {
    code => "
  h = event.get('docs')
           def arrays_to_hash(h)
          h.each do |k,v|
            # If v is nil, an array is being iterated and the value is k.
            # If v is not nil, a hash is being iterated and the value is v.
            value = v || k
            if value.is_a?(Array)
                
                value_hash = {}
                value.each_with_index do |v, i|
                    value_hash[i.to_s] = v
                end
                h[k] = value_hash
            end

            if value.is_a?(Hash) || value.is_a?(Array)
              arrays_to_hash(value)
            end
          end
        end
    
    test= arrays_to_hash(h);
    event.set('itemnum',test);

"
}



}
	
output {  
stdout { codec => rubydebug}
    elasticsearch { 
	   action => "index"
.......

    }
 }

This is my Elastic output stdout { codec => rubydebug }

{
         
       "itemnum" => {
        "cars" => {
            "0" => {
                "models" => {
                    "0" => "Fiesta",
                    "1" => "Focus",
                    "2" => "Mustang"
                },
                  "name" => "Ford"
            },
            "1" => {
                "models" => {
                    "0" => "320",
                    "1" => "X3",
                    "2" => "X5"
                },
                  "name" => "BMW"
            },
            "2" => {
                "models" => {
                    "0" => "500",
                    "1" => "Panda"
                },
                  "name" => "Fiat"
            }
        },
        "name" => "John",
         "age" => 30
    },
    "@timestamp" => 2017-09-22T10:11:49.757Z,
          "docs" => {
        "cars" => [
            [0] {
                "models" => [
                    [0] "Fiesta",
                    [1] "Focus",
                    [2] "Mustang"
                ],
                  "name" => "Ford"
            },
            [1] {
                "models" => [
                    [0] "320",
                    [1] "X3",
                    [2] "X5"
                ],
                  "name" => "BMW"
            },
            [2] {
                "models" => [
                    [0] "500",
                    [1] "Panda"
                ],
                  "name" => "Fiat"
            }
        ],
        "name" => "John",
         "age" => 30
    },

How can get three records

itemnum.cars.name : Ford
itemnum.cars.name : BMW
itemnum.cars.name : Fiat

and these field should be repeated, so that i can see Kibana dicover

itemnum.name : John
itemnum.age  : 30

currently i am getting like this

itemnum.name : John
itemnum.age  : 30
itemnum.cars.0.name : Ford
itemnum.cars.1.name : BMW
itemnum.cars.2.name : Fiat

(Magnus Bäck) #6

Your Ruby code isn't producing arrays, it's producing objects with integer keys.

If you can show, as JSON,

  • what the input looks like (without ruby filter) and
  • the desired output

it'll be easier to help. Terms like "records" and "repeated fields" are ambiguous so I really want to see the wanted JSON representations.


(Ritesh) #7
  1. This is without ruby filter

    "@timestamp" => 2017-09-22T11:22:11.470Z,
          "docs" => {
        "cars" => [
            [0] {
                "models" => [
                    [0] "Fiesta",
                    [1] "Focus",
                    [2] "Mustang"
                ],
                  "name" => "Ford"
            },
            [1] {
                "models" => [
                    [0] "320",
                    [1] "X3",
                    [2] "X5"
                ],
                  "name" => "BMW"
            },
            [2] {
                "models" => [
                    [0] "500",
                    [1] "Panda"
                ],
                  "name" => "Fiat"
            }
        ],
        "name" => "John",
         "age" => 30
    },
    

My expected output in Kibana Discover

image


(Magnus Bäck) #8

I explicitly asked for JSON representations and yet you give me a screenshot from Kibana. In this case I think I get what you're after anyway but if you want people to help you you need to provide the information asked for.

Use a split filter to create a separate event for each item in the cars array.


(Ritesh) #9

Hi, i think i have provided all the information that i had
Json source, rubydebug output, conf file and first message the ruby code.

I am not sure JSON representation, I think you are looking at this below i pasted . I am sorry for that

Anyhow i will try the split filter . I was knowing the split filter, but not sure how to use in this context of array, In this case i am not sure, ruby filter is valid or not. I think i will have to drop using ruby filter and use only split filter?

{
  "_id": "AV6pUuJHlCCNY9gNgHan",
  "_score": 1,
  "_source": {
    "path": "...../logstash-5.5.2/bin/test1.json",
    "@timestamp": "2017-09-22T11:22:11.470Z",
    "docs": {
      "cars": [
        {
          "models": [
            "Fiesta",
            "Focus",
            "Mustang"
          ],
          "name": "Ford"
        },
        {
          "models": [
            "320",
            "X3",
            "X5"
          ],
          "name": "BMW"
        },
        {
          "models": [
            "500",
            "Panda"
          ],
          "name": "Fiat"
        }
      ],
      "name": "John",
      "age": 30
    },
    "@version": "1",
    "message": "{\"name\":\"John\",\"age\":30,\"cars\":[{\"name\":\"Ford\",\"models\":[\"Fiesta\",\"Focus\",\"Mustang\"]},{\"name\":\"BMW\",\"models\":[\"320\",\"X3\",\"X5\"]},{\"name\":\"Fiat\",\"models\":[\"500\",\"Panda\"]}]}\r",
    "type": "test"
  }
}

(Christian Dahlqvist) #10

How do you want to search this nested structure? Are you planning on using Kibana?


(Ritesh) #11

Yes i am planning to use Kibana.
Currently in the example i have only 2 nested array.

I have one more source with 6 nested array. But currently struggling with 2 nested array only that i stated above.

Somehow i feel split filter is not a good solution as it deteriorate performance as i have 10 k records to load, that why i came with ruby filter

Currently i am still trying with split filter as above suggested, hopefully will crack it. Will update once i get some result


(Christian Dahlqvist) #12

Kibana will probably work better with a flattened, denormalized structure as it does not support nested documents well.


(Ritesh) #13

Hi @magnusbaeck

I tried split filter and it is working. I hope it will work for large dataset as well.

For others i used this split filter and it works

split {field => "[docs][cars]"}

split {field => "[docs][cars][models]"}

But i am not able to rename this array based field. How can i rename this "docs.cars.name" to "car". I tried mutate but it is not working

mutate {	
rename => { "docs.cars.name" => "car" }

}

(Magnus Bäck) #14

Just like the split field the mutate field (and all other plugins) use the [field][subfield] notation for addressing subfields (not field.subfield).


(Ritesh) #15

This worked. Thanks


(system) #16

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.