Logstash converting data from a flattened table from a SQL query into JSON inner object

aohsie · August 22, 2019, 8:29pm

I'm looking to index flattened data into Elasticsearch without creating duplicate data. For example, if a person has multiple cars, then a flattened table will look something like this:

In reality, there's only one person with multiple cars, but if I feed this into elasticsearch it won't realize and just create two copies of the same person. Instead I want it to index only one person, and have an inner JSON object that contains an array of the cars belonging to that one person.
It should look something like this:

{
name: "Test Johny",
cars: [
{ car: "Toyota Camry" },
{ car: "Honda Civic" }
]
}

Is there a way to configure logstash to take that flattened data and convert it into this structure before passing it on to elasticsearch?

elasticforme · August 22, 2019, 9:36pm

it is possible and called nested type. Never done it.
I will learn when someone answers here now.

Badger · August 22, 2019, 9:37pm

Take a look at the aggregate filter.

aohsie · August 22, 2019, 9:54pm

Elasticsearch does have nested types, but that's only when the data coming in is already in an inner object form, like the example I gave in my original post. By default Elasticsearch flattents that data, unless you tell it that it's a nested type. But I'm trying to deal with the step before, which is how to get the data into that inner object form. If I try entering the original flattened table, Elasticsearch will simply create two separate documents for the same person, because that's how it is being received. It doesn't know to nest them.

The way I have been using Elasticsearch until now is by querying an SQL DB with a join clause, and putting the flattened data into an Excel spreadsheet. Then I used a simple Java program to parse it to JSON and send it to Elasticsearch. If I would send it how it is in a flattened format, then Elastictsearch would index multiple documents for the same person, because the original data was flattened. Therefore, before sending the data to Elasicsearch, I did a check to see if the person was already created. If he/she wasn't created, I created them and inserted their car as an inner object. If they were already created, then I would just stick their second car into the inner object. So when I sent all the data to Elasticsearch, it only created one document for each person, because that's how I sent it.

What I'm wondering is if Logstash has a way to do what I did with my Java program, which is to take a flattened data set and combine the data to remove all replicas of people and instead insert the extra cars into an inner object. That way when it's sent to Elasticsearch, it won't create multiple documents for the same person.

aohsie · August 22, 2019, 9:55pm

I'll look into it thanks. I'm new to Logstash, so I'm just learning about all the plugins it has.

system · September 19, 2019, 9:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to filter data while migrating from SQL to Elasticsearch using Logstash to obtain a nested strings array in a JSON object Logstash	1	449	October 5, 2021
How to configure logstash file to make nested object inside another nested filed from sql query? Logstash	3	859	May 4, 2021
Periodically indexing SQL data to ES using Logstash Elasticsearch	5	731	November 8, 2017
Importing into nested document using logstash Logstash	1	1442	September 22, 2017
How to use jdbc to import data into nested objects? Logstash	8	10069	November 4, 2022

Logstash converting data from a flattened table from a SQL query into JSON inner object

Related topics