Combining multiple documents based on ID

Hello logstash!

I'm new to Elastic stack.

We are transferring data from Sql Server to ElasticSearch using LogStash's jdbc plugin. So far, it's fine. Normally jdbc plugin creates a new document for each Sql Server row, but I want to create a new document by combining these multiple documents based on ID column. How can I do that?

Our data like:

id			column1			column2
1			A				xyz
2			B				xxx
3			C				yyy
4			D				xyz
5			E				xyz

I want to combine 1, 4, and 5 records in a new document based on column2, xyz. Records must be added to the new document according to the date order in Sql Server. I can do this using order by, but I need help in creating a new document.

We can create a new document from records already on ElasticSearch, or we can create a new document based on column2 when the records pass through LogStash, so there won't be extra record on ElasticSearch.

Thanks for your interest.

So you want an array with all the id or column1 name?

Hello Mark, thank you for your answer!

LogStash's jdbc plugin creates a json object for each row. This is fine.

{
id: 1
column1: A
column2: xyz
}

What I need is to have a single json object by grouping all the rows according to the column2.

{
id: 1
column1: A
column2: xyz
id: 4
column1: D
column2: xyz
id: 5
column1: E
column2: xyz
}

Can Logstash do that, grouping objects, for column2=xyz in the example, before sending to Elasticsearch?
How?

That second one won't be valid as you have multiples of the same field name.

You should be able to do a merge so you end up with;

{
id: [1, 4, 5]
column1: [A, D, E]
column2: xyz
}

I think the id column is confusing. It does not matter what the id is. id may be anything unique from 1, 4, and 5.

Does column2 mean that it can not be combined into a single document because it is a repeating column?

Is not it possible with LogStash to get the following document?

{
_id: unique_value (I can hold this new document with Elasticsearch's _id column)
column1: A
column2: xyz
id: 4
column1: D
column2: xyz
id: 5
column1: E
column2: xyz
}

elasticsearch filter plugin can help, or aggregate plugin? I read about them, but I need some help how to do..

Thanks!

Thing is that it will still collapse those multiple id, column1 and column2 fields into arrays. So you will never get that exact format.

Hello Mark,

What about this? Adding documents in succession. Still not possible with Logstash?

{
{
id: 1
column1: A
column2: xyz
},
{
id: 4
column1: D
column2: xyz
},
{
id: 5
column1: E
column2: xyz
}
}

That is possible, yes.

OK :slight_smile:
So, how can I achieve this? Plugin? Script?
Can you please guide me on this matter?

This could be done with a Logstash plugin? I'm trying with some examples but I have not succeeded yet.

Can someone please give me a start?

Thank you ~

Your latest example is still not valid JSON. How are you going to use/query this data?

Hello Christian,

I'm not sure how json format should be, but I'm expecting to query with grouped column, column2: xyz in the example. So, my problem is here certain, to get all the documents into a new one big document according to the column2 value. How can I do that Christian, with correct json format?

Any ideas?

Are you going to query the data using one of the language clients or use Kibana? What does this data really represent?

Christian,

I'm going to use Kibana, trying to create a datatable or chart in Kibana using relational database data.

The database table contains data about the transaction, or rather, the data about the steps in the transaction. But Logstash jdbc plugin adds each step of the transaction as a new document (for every row in the database). If I can group the documents to be indexed on Elasticsearch by column2, and if I can collect all the step information of the transaction in one document only, I can create the visualization I want, which corresponds to column2 in my example.

I tried Logstash aggregate plugin, but the result not exactly I need. What do you think? How can I get that "one document which includes all the steps info of a transaction" document in a simple and effective way?

Thank you for your help.

The reason I asked is that Kibana currently does not support nested documents well and the examples you provided seemed to indicate a nested structure. As I do not know your data nor the different ways you need to query and analyse it, it is hard for me to recommend a structure.

Okay then.
Thank you very much for your help anyway!

For some types of data you may benefit from storing the raw data (to serve some types of analysis) and also create an entity-centric index for other types of analysis.

You can use jdbcinput plugin and provide the SQLserver query with the row selection criteria in the input section of logstash configuration file to fetch the data.

Thank you Christian_Dahlqvist, I'll check your video!

Deepti_Jain, I don't get it. Do you have a working example?