Split filter and add_field encoding object to a JSON string

MrOg · July 4, 2023, 8:46pm

Hi, I have such a set of filters

filter {
    json {
        source => "[sql_data][response]"
    }
    split {
        field => "docs"
        add_field => {
            "id" => "%{[docs][id]}"
            "names" => "%{[docs][names]}"
        }
        remove_field => [
            "sql_data",
            "total",
            "limit",
            "page",
            "pages"
        ]
    }
}

And it gives me a following result

...
{
	"docs" => {
		"id" => 5271175,
		"names" => [
			[0] {
				"name" => "Element A"
				"type" => "F"
			},
			[1] {
				"name" => "Element B"
				"type" => "F"
			}
		]
	},
	"@timestamp" => 2023-07-04T20:36:17.268796400Z,
	"names" => "{name=Element A, type=F},{name=Element B, type=F}",
	"@version" => "1",
	"id" => "5271175"
}
...

Can someone help me to understand, please! If there is any option to move [docs][names] to [names] without converting/encoding it to a JSON string, or should I move it like that and restore it back to JSON somehow?

Thanks!

leandrojmp · July 4, 2023, 8:59pm

Hello,

It is not clear what is the issue, what is not working and what is the result you expect.

Also, you need to share a sample of your message before the split filters so it will be possible to replicate your pipeline.

MrOg · July 4, 2023, 9:17pm

Ok, my current data structure is following

2023-07-05 00:11:56 [2023-07-04T21:11:56,855][INFO ][logstash.inputs.jdbc     ][main][bbba3150f9c946cd7407c40277f99ddb5c77b481b669b359f5f6fd68c1b732b5] (0.001225s) SELECT id, endpoint, method, response, response_code, created_at, processed_at FROM request_queue WHERE id > 0 ORDER BY id ASC LIMIT 1
2023-07-05 00:11:56 {
2023-07-05 00:11:56          "total" => 944195,
2023-07-05 00:11:56          "limit" => 2,
2023-07-05 00:11:56       "@version" => "1",
2023-07-05 00:11:56           "page" => 1,
2023-07-05 00:11:56          "pages" => 472098,
2023-07-05 00:11:56           "docs" => [
2023-07-05 00:11:56         [0] {
2023-07-05 00:11:56                "id" => 5309714,
2023-07-05 00:11:56             "names" => [
2023-07-05 00:11:56                 [0] {
2023-07-05 00:11:56                         "name" => "Untitled the Pope's Exorcist sequel",
2023-07-05 00:11:56                     "language" => nil
2023-07-05 00:11:56                 }
2023-07-05 00:11:56             ]
2023-07-05 00:11:56         },
2023-07-05 00:11:56         [1] {
2023-07-05 00:11:56                "id" => 5307599,
2023-07-05 00:11:56             "names" => [
2023-07-05 00:11:56                 [0] {
2023-07-05 00:11:56                         "name" => "Masters of the Universe: Revolution",
2023-07-05 00:11:56                     "language" => nil
2023-07-05 00:11:56                 }
2023-07-05 00:11:56             ]
2023-07-05 00:11:56         }
2023-07-05 00:11:56     ],
2023-07-05 00:11:56     "@timestamp" => 2023-07-04T21:11:56.858622200Z,
2023-07-05 00:11:56       "sql_data" => {
2023-07-05 00:11:56              "endpoint" => "v1.3/movie",
2023-07-05 00:11:56         "response_code" => 200,
2023-07-05 00:11:56                "method" => "GET",
2023-07-05 00:11:56          "processed_at" => 2023-07-04T21:10:00.000Z,
2023-07-05 00:11:56            "created_at" => 2023-07-04T21:10:00.000Z,
2023-07-05 00:11:56                    "id" => 1,
2023-07-05 00:11:56              "response" => #<Sequel::SQL::Blob:0x15f8 bytes=235 start="{\"docs\":[{" end="s\":472098}">
2023-07-05 00:11:56     }
2023-07-05 00:11:56 }

Content of request_queue.response

{
    "docs": [
        {
            "id": 5309714,
            "names": [
                {
                    "name": "Untitled the Pope's Exorcist sequel",
                    "language": null
                }
            ]
        },
        {
            "id": 5307599,
            "names": [
                {
                    "name": "Masters of the Universe: Revolution",
                    "language": null
                }
            ]
        }
    ],
    "total": 944195,
    "limit": 2,
    "page": 1,
    "pages": 472098
}

I want to save each object from docs as an index document.

My current conf

input {
    jdbc {
        jdbc_driver_library => "${LOGSTASH_JDBC_DRIVER_JAR_LOCATION}"
        jdbc_driver_class => "${LOGSTASH_JDBC_DRIVER}"
        jdbc_connection_string => "${LOGSTASH_JDBC_URL}"
        jdbc_user => "${LOGSTASH_JDBC_USERNAME}"
        jdbc_password => "${LOGSTASH_JDBC_PASSWORD}"
        schedule => "*/1 * * * * *"
        statement => "SELECT id, endpoint, method, response, response_code, created_at, processed_at FROM request_queue WHERE id > 0 ORDER BY id ASC LIMIT 1"
        target => "sql_data"
    }
}

filter {
    json {
        source => "[sql_data][response]"
    }
    split {
        field => "docs"
        add_field => {
            "id" => "%{[docs][id]}"
            "names" => "%{[docs][names]}"
        }
        remove_field => [
            "sql_data",
            "total",
            "limit",
            "page",
            "pages"
        ]
    }
}

After

        add_field => {
            "id" => "%{[docs][id]}"
            "names" => "%{[docs][names]}"
        }

the field names being converted to JSON string.

leandrojmp · July 4, 2023, 9:30pm

And what does your docs field looks like? You didn't share how it looks like before the split filter.

Remove the split filter from your pipeline and share a sample of your document, without it is pretty hard to replicate your issue.

EDIT: Nevermind, the content of the docs is on the request_queue.response right?

I will try to simulate your pipeline.

MrOg · July 4, 2023, 9:33pm

EDIT: Nevermind, the content of the docs is on the request_queue.response right?

Yes

MrOg · July 18, 2023, 7:50pm

@leandrojmp Hi!
Sorry to bother you, any news

leandrojmp · July 20, 2023, 1:36pm

Hello,

Just replicated it here and there is no issue with the filters, but to get your desired output you will need to make a little change.

Your split filter looks like this:

    split {
        field => "docs"
        add_field => {
            "id" => "%{[docs][id]}"
            "names" => "%{[docs][names]}"
        }
    }

It will create documents like this:

{
          "docs" => {
           "id" => 5309714,
        "names" => [
            [0] {
                    "name" => "Untitled the Pope's Exorcist sequel",
                "language" => nil
            }
        ]
    },
         "total" => 944195,
         "limit" => 2,
         "pages" => 472098,
    "@timestamp" => 2023-07-20T13:22:01.219353969Z,
          "host" => "lab",
         "names" => "{name=Untitled the Pope's Exorcist sequel, language=null}",
      "@version" => "1",
          "page" => 1,
            "id" => "5309714"
}

There are two issues here, first is that your docs.names field is also an array, maybe you should also split on this field, but this depends on your use case.

Second, when you use add_field and the source field is an json object, the target field will not be a json object, it will receive the json as string, which is the output you are getting.

"names" => "{name=Untitled the Pope's Exorcist sequel, language=null}"

If you want the names field to be a json object you need to user another mutate with a rename or a copy after the split filter.

    mutate {
        copy => {
            "[docs][names]" => "names"
        }
    }

MrOg · July 27, 2023, 4:46pm

Thank you soo much Sir.
I'm in a trip atm. Will check it at the weekend.
Thank you soo much for your help and time!

system · August 24, 2023, 4:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash data formatting help Logstash	2	477	September 20, 2017
Logstash Split Plugin Logstash	14	2080	March 8, 2017
Question on logstash removing field from json string and saving back to same field Logstash	3	1257	April 15, 2021
JSON Result with 'columns' and 'rows' values - help parsing Logstash	13	2317	February 10, 2019
JSON filter do not encode part of JSON Logstash	1	641	July 6, 2017

Split filter and add_field encoding object to a JSON string

Related topics