Split filter and add_field encoding object to a JSON string

Hi, I have such a set of filters

filter {
    json {
        source => "[sql_data][response]"
    }
    split {
        field => "docs"
        add_field => {
            "id" => "%{[docs][id]}"
            "names" => "%{[docs][names]}"
        }
        remove_field => [
            "sql_data",
            "total",
            "limit",
            "page",
            "pages"
        ]
    }
}

And it gives me a following result

...
{
	"docs" => {
		"id" => 5271175,
		"names" => [
			[0] {
				"name" => "Element A"
				"type" => "F"
			},
			[1] {
				"name" => "Element B"
				"type" => "F"
			}
		]
	},
	"@timestamp" => 2023-07-04T20:36:17.268796400Z,
	"names" => "{name=Element A, type=F},{name=Element B, type=F}",
	"@version" => "1",
	"id" => "5271175"
}
...

Can someone help me to understand, please! If there is any option to move [docs][names] to [names] without converting/encoding it to a JSON string, or should I move it like that and restore it back to JSON somehow?

Thanks!

Hello,

It is not clear what is the issue, what is not working and what is the result you expect.

Also, you need to share a sample of your message before the split filters so it will be possible to replicate your pipeline.

Ok, my current data structure is following

2023-07-05 00:11:56 [2023-07-04T21:11:56,855][INFO ][logstash.inputs.jdbc     ][main][bbba3150f9c946cd7407c40277f99ddb5c77b481b669b359f5f6fd68c1b732b5] (0.001225s) SELECT id, endpoint, method, response, response_code, created_at, processed_at FROM request_queue WHERE id > 0 ORDER BY id ASC LIMIT 1
2023-07-05 00:11:56 {
2023-07-05 00:11:56          "total" => 944195,
2023-07-05 00:11:56          "limit" => 2,
2023-07-05 00:11:56       "@version" => "1",
2023-07-05 00:11:56           "page" => 1,
2023-07-05 00:11:56          "pages" => 472098,
2023-07-05 00:11:56           "docs" => [
2023-07-05 00:11:56         [0] {
2023-07-05 00:11:56                "id" => 5309714,
2023-07-05 00:11:56             "names" => [
2023-07-05 00:11:56                 [0] {
2023-07-05 00:11:56                         "name" => "Untitled the Pope's Exorcist sequel",
2023-07-05 00:11:56                     "language" => nil
2023-07-05 00:11:56                 }
2023-07-05 00:11:56             ]
2023-07-05 00:11:56         },
2023-07-05 00:11:56         [1] {
2023-07-05 00:11:56                "id" => 5307599,
2023-07-05 00:11:56             "names" => [
2023-07-05 00:11:56                 [0] {
2023-07-05 00:11:56                         "name" => "Masters of the Universe: Revolution",
2023-07-05 00:11:56                     "language" => nil
2023-07-05 00:11:56                 }
2023-07-05 00:11:56             ]
2023-07-05 00:11:56         }
2023-07-05 00:11:56     ],
2023-07-05 00:11:56     "@timestamp" => 2023-07-04T21:11:56.858622200Z,
2023-07-05 00:11:56       "sql_data" => {
2023-07-05 00:11:56              "endpoint" => "v1.3/movie",
2023-07-05 00:11:56         "response_code" => 200,
2023-07-05 00:11:56                "method" => "GET",
2023-07-05 00:11:56          "processed_at" => 2023-07-04T21:10:00.000Z,
2023-07-05 00:11:56            "created_at" => 2023-07-04T21:10:00.000Z,
2023-07-05 00:11:56                    "id" => 1,
2023-07-05 00:11:56              "response" => #<Sequel::SQL::Blob:0x15f8 bytes=235 start="{\"docs\":[{" end="s\":472098}">
2023-07-05 00:11:56     }
2023-07-05 00:11:56 }

Content of request_queue.response

{
    "docs": [
        {
            "id": 5309714,
            "names": [
                {
                    "name": "Untitled the Pope's Exorcist sequel",
                    "language": null
                }
            ]
        },
        {
            "id": 5307599,
            "names": [
                {
                    "name": "Masters of the Universe: Revolution",
                    "language": null
                }
            ]
        }
    ],
    "total": 944195,
    "limit": 2,
    "page": 1,
    "pages": 472098
}

I want to save each object from docs as an index document.

My current conf

input {
    jdbc {
        jdbc_driver_library => "${LOGSTASH_JDBC_DRIVER_JAR_LOCATION}"
        jdbc_driver_class => "${LOGSTASH_JDBC_DRIVER}"
        jdbc_connection_string => "${LOGSTASH_JDBC_URL}"
        jdbc_user => "${LOGSTASH_JDBC_USERNAME}"
        jdbc_password => "${LOGSTASH_JDBC_PASSWORD}"
        schedule => "*/1 * * * * *"
        statement => "SELECT id, endpoint, method, response, response_code, created_at, processed_at FROM request_queue WHERE id > 0 ORDER BY id ASC LIMIT 1"
        target => "sql_data"
    }
}

filter {
    json {
        source => "[sql_data][response]"
    }
    split {
        field => "docs"
        add_field => {
            "id" => "%{[docs][id]}"
            "names" => "%{[docs][names]}"
        }
        remove_field => [
            "sql_data",
            "total",
            "limit",
            "page",
            "pages"
        ]
    }
}

After

        add_field => {
            "id" => "%{[docs][id]}"
            "names" => "%{[docs][names]}"
        }

the field names being converted to JSON string.

And what does your docs field looks like? You didn't share how it looks like before the split filter.

Remove the split filter from your pipeline and share a sample of your document, without it is pretty hard to replicate your issue.

EDIT: Nevermind, the content of the docs is on the request_queue.response right?

I will try to simulate your pipeline.

EDIT: Nevermind, the content of the docs is on the request_queue.response right?

Yes

@leandrojmp Hi!
Sorry to bother you, any news :pleading_face:

Hello,

Just replicated it here and there is no issue with the filters, but to get your desired output you will need to make a little change.

Your split filter looks like this:

    split {
        field => "docs"
        add_field => {
            "id" => "%{[docs][id]}"
            "names" => "%{[docs][names]}"
        }
    }

It will create documents like this:

{
          "docs" => {
           "id" => 5309714,
        "names" => [
            [0] {
                    "name" => "Untitled the Pope's Exorcist sequel",
                "language" => nil
            }
        ]
    },
         "total" => 944195,
         "limit" => 2,
         "pages" => 472098,
    "@timestamp" => 2023-07-20T13:22:01.219353969Z,
          "host" => "lab",
         "names" => "{name=Untitled the Pope's Exorcist sequel, language=null}",
      "@version" => "1",
          "page" => 1,
            "id" => "5309714"
}

There are two issues here, first is that your docs.names field is also an array, maybe you should also split on this field, but this depends on your use case.

Second, when you use add_field and the source field is an json object, the target field will not be a json object, it will receive the json as string, which is the output you are getting.

"names" => "{name=Untitled the Pope's Exorcist sequel, language=null}"

If you want the names field to be a json object you need to user another mutate with a rename or a copy after the split filter.

    mutate {
        copy => {
            "[docs][names]" => "names"
        }
    }

Thank you soo much Sir.
I'm in a trip atm. Will check it at the weekend.
Thank you soo much for your help and time!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.