Logstash converting data from a flattened table from a SQL query into JSON inner object

I'm looking to index flattened data into Elasticsearch without creating duplicate data. For example, if a person has multiple cars, then a flattened table will look something like this:

| Name | Car |
|Test Johny|Toyota Camry|
|Test Johny| Honda Civic |

In reality, there's only one person with multiple cars, but if I feed this into elasticsearch it won't realize and just create two copies of the same person. Instead I want it to index only one person, and have an inner JSON object that contains an array of the cars belonging to that one person.
It should look something like this:

{
name: "Test Johny",
cars: [
{ car: "Toyota Camry" },
{ car: "Honda Civic" }
]
}

Is there a way to configure logstash to take that flattened data and convert it into this structure before passing it on to elasticsearch?

it is possible and called nested type. Never done it.
I will learn when someone answers here now.

Take a look at the aggregate filter.

Elasticsearch does have nested types, but that's only when the data coming in is already in an inner object form, like the example I gave in my original post. By default Elasticsearch flattents that data, unless you tell it that it's a nested type. But I'm trying to deal with the step before, which is how to get the data into that inner object form. If I try entering the original flattened table, Elasticsearch will simply create two separate documents for the same person, because that's how it is being received. It doesn't know to nest them.

The way I have been using Elasticsearch until now is by querying an SQL DB with a join clause, and putting the flattened data into an Excel spreadsheet. Then I used a simple Java program to parse it to JSON and send it to Elasticsearch. If I would send it how it is in a flattened format, then Elastictsearch would index multiple documents for the same person, because the original data was flattened. Therefore, before sending the data to Elasicsearch, I did a check to see if the person was already created. If he/she wasn't created, I created them and inserted their car as an inner object. If they were already created, then I would just stick their second car into the inner object. So when I sent all the data to Elasticsearch, it only created one document for each person, because that's how I sent it.

What I'm wondering is if Logstash has a way to do what I did with my Java program, which is to take a flattened data set and combine the data to remove all replicas of people and instead insert the extra cars into an inner object. That way when it's sent to Elasticsearch, it won't create multiple documents for the same person.

I'll look into it thanks. I'm new to Logstash, so I'm just learning about all the plugins it has.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.