Parse JSON stringified value of a field of an object in the array of objects

Hi all,

I've been migrating order & order items one-to-many relation from MySQL into ES using Logstash.

With the help of filter aggregation for my joined SQL, the items become a nested property under each order and they look like this:

"order_metadata": {
      "some_data_foo": "123QUX",
      "some_data_bar": "4a8d77550086253a0ac3f1",
      "some_other_data_baz": "Some Words",
      "some_other_data_foo": "9876"
},
"some_order_field": "Foo Bar Baz Qux 123"
"items": [
    {
        "category": "bar",
        "some_json_structure": "{\"size\":\"XL\",\"color\":\"red\",\"depth\":0.2,\"height\":0.6,\"unit\":\"M\"}",
        "code": "XYZ002-T004",
        "metadata": "{\"location\":\"FooBar\"}",
        "quantity": 1,
        "price": 5.0,
        "name": "Sample Item",
        "id": 23966914,
        "weight": 0.2
    },
    {
        "category": "foo",
        "some_json_structure": "[]",
        "code": "XYZ005-T012",
        "metadata": "{\"location\":\"FooBar Baz\"}",
        "quantity": 2,
        "price": 10.0,
        "name": "Sample Item 2",
        "id": 25966999,
        "weight": 0.5
    }
],

Originally the order_metadata field (which is never a nested structure but is made of key: value lists), comes as stringified JSON and using json { source => "order_metadata" } in filter {} I can easily parse this into an object field.

However I cannot seem to parse the stringified JSON fields that are found within the array of objects using json { source => 'items.property_name' }, such as items.some_json_structure and items.metadata seen in the above structure.

To note, items.some_json_structure that comes in is either [] or an object like in the first item, while items.metadata that comes in is either an empty string or an object.

  1. How can I parse the stringified JSON values of the fields of an object in the array of objects?
  2. I noticed that doing must match queries that look up an item that has name = Sample AND code = XYZ take a lot more time to query than top level object properties. Is there any advantage that can be gained by somehow making each item a top level property of the order, mapping them as objects?

Thanks for your time!

You would pass [items][0][some_json_structure] to a json filter. If you have a variable number of them you would have to iterate over them in a ruby filter and parse them there.

Thanks for pointing me to ruby filter @Badger , that was very useful! I've done this now using:

  ruby {
    init => "require 'json'"
    code => "
      items = event.get('items')

      items_clone = items.clone

      items_clone.each_with_index do |(key, value), index|

        key_clone = key.clone

        key_clone.each do |k, val|

            if k === 'variation_list'
                parsed = val.empty? ? '' : JSON.parse(val)
                key_clone['variation_list'] = parsed
            end

            if k === 'metadata'
                parsed_val = val.empty? ? '' : JSON.parse(val)
                key_clone['metadata'] = parsed_val
            end

        end

        items_clone[index] = key_clone

      end
      event.set('items', items_clone)
    "
  }

I can see that they are turned into proper json structures on the output.

Buuut now I get a lot of warnings saying: failed to parse field [items.metadata] of type [text] in document and the metadata and variation_list is still showing up as text.

My mapping looks like this:

  1. in the index template I supply in elasticsearch{} bit of logstash config:
      "items": {
        "type": "nested",
        "dynamic": false,
        "properties": {
          "id": {
            "type": "long"
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "price": {
            "type": "double"
          },
          "quantity": {
            "type": "integer"
          },
          "weight": {
            "type": "double"
          },
          "code": {
            "type": "text"
          },
          "variation_list": {
            "type": "nested",
            "properties": {}
          },
          "metadata": {
            "type": "object",
            "properties": {}
          },
          "delivery_type": {
            "type": "text"
          },
          "category_code": {
            "type": "text"
          },
          "fulfillment_method": {
            "type": "text"
          }
        }
      }

and the relevant aggregate{} bit looks like this:

aggregate {
   task_id => "%{id}"
   code => "
     # some other stuff
     map['items'] ||= []

     map['items'] << {
       'id' => event.get('item_id'),
       'name' => event.get('item_name'),
       'price' => event.get('item_price'),
       'quantity' => event.get('item_quantity'),
       'weight' => event.get('item_weight'),
       'code' => event.get('item_code'),
       'variation_list' => event.get('item_variation_list'),
       'metadata' => event.get('item_metadata'),
       'delivery_type' => event.get('item_delivery_type'),
       'category_code' => event.get('item_category_code'),
       'fulfillment_method' => event.get('item_fulfillment_method')
     }
     event.cancel()
   "
   push_previous_map_as_event => true
   timeout => 5
   timeout_tags => ['aggregated']
}

Is it because when the aggregate map is pushed it's already mapped as text as they were stringified json prior to ruby filter?

I'm really not sure why is it trying to parse it as text still. Could you help me solving this problem?

That ruby filter results in

           "items" => [
    [0] {
                            [...]
                         "id" => 23966914,
                   "metadata" => {
            "location" => "FooBar"
        },

[items][0][metadata] is an object, which is what the template suggests it should be. I am guessing the full error message is a mapping exception, but I am not sure if that is saying that [items][0][metadata] is text, or that it is failing to parse it as text. The former would suggest some events do not have the structure you think they do, and the latter would suggest that you have previously created documents in which that field is text. A field cannot be text on some documents and an object on others. Rolling over to a new index might fix the latter.

1 Like

You're right! So I deleted existing documents in the current index, as they were already previously mapped as text from my initial submissions. I also turned the parsing bit of the ruby code to:

parsed = val.empty? ? JSON['{}'] : JSON.parse(val)

so that it creates an empty object here instead of an empty string. Not sure if this is 'the good way' to fixit but it definitely gets mapped as objects now and I get no errors!

Thanks for all the fish @Badger, you're awesome :slight_smile: