Combining multiple documents based on ID

qube · October 27, 2017, 7:57am

Hello logstash!

I'm new to Elastic stack.

We are transferring data from Sql Server to ElasticSearch using LogStash's jdbc plugin. So far, it's fine. Normally jdbc plugin creates a new document for each Sql Server row, but I want to create a new document by combining these multiple documents based on ID column. How can I do that?

Our data like:

id			column1			column2
1			A				xyz
2			B				xxx
3			C				yyy
4			D				xyz
5			E				xyz

I want to combine 1, 4, and 5 records in a new document based on column2, xyz. Records must be added to the new document according to the date order in Sql Server. I can do this using order by, but I need help in creating a new document.

We can create a new document from records already on ElasticSearch, or we can create a new document based on column2 when the records pass through LogStash, so there won't be extra record on ElasticSearch.

Thanks for your interest.

warkolm · October 29, 2017, 8:44am

So you want an array with all the id or column1 name?

qube · October 29, 2017, 4:18pm

Hello Mark, thank you for your answer!

LogStash's jdbc plugin creates a json object for each row. This is fine.

{
id: 1
column1: A
column2: xyz
}

What I need is to have a single json object by grouping all the rows according to the column2.

{
id: 1
column1: A
column2: xyz
id: 4
column1: D
column2: xyz
id: 5
column1: E
column2: xyz
}

Can Logstash do that, grouping objects, for column2=xyz in the example, before sending to Elasticsearch?
How?

warkolm · October 29, 2017, 7:55pm

That second one won't be valid as you have multiples of the same field name.

You should be able to do a merge so you end up with;

{
id: [1, 4, 5]
column1: [A, D, E]
column2: xyz
}

qube · October 29, 2017, 8:51pm

I think the id column is confusing. It does not matter what the id is. id may be anything unique from 1, 4, and 5.

Does column2 mean that it can not be combined into a single document because it is a repeating column?

Is not it possible with LogStash to get the following document?

{
_id: unique_value (I can hold this new document with Elasticsearch's _id column)
column1: A
column2: xyz
id: 4
column1: D
column2: xyz
id: 5
column1: E
column2: xyz
}

elasticsearch filter plugin can help, or aggregate plugin? I read about them, but I need some help how to do..

Thanks!

warkolm · October 29, 2017, 9:05pm

Thing is that it will still collapse those multiple id, column1 and column2 fields into arrays. So you will never get that exact format.

qube · October 30, 2017, 7:39am

Hello Mark,

What about this? Adding documents in succession. Still not possible with Logstash?

{
{
id: 1
column1: A
column2: xyz
},
{
id: 4
column1: D
column2: xyz
},
{
id: 5
column1: E
column2: xyz
}
}

warkolm · October 30, 2017, 10:07am

That is possible, yes.

qube · October 30, 2017, 10:37am

OK
So, how can I achieve this? Plugin? Script?
Can you please guide me on this matter?

qube · October 31, 2017, 6:17am

This could be done with a Logstash plugin? I'm trying with some examples but I have not succeeded yet.

Can someone please give me a start?

Thank you ~

Christian_Dahlqvist · October 31, 2017, 6:54am

Your latest example is still not valid JSON. How are you going to use/query this data?

qube · October 31, 2017, 8:40am

Hello Christian,

I'm not sure how json format should be, but I'm expecting to query with grouped column, column2: xyz in the example. So, my problem is here certain, to get all the documents into a new one big document according to the column2 value. How can I do that Christian, with correct json format?

qube · November 1, 2017, 5:13am

Any ideas?

Christian_Dahlqvist · November 1, 2017, 7:42am

Are you going to query the data using one of the language clients or use Kibana? What does this data really represent?

qube · November 1, 2017, 8:46am

Christian,

I'm going to use Kibana, trying to create a datatable or chart in Kibana using relational database data.

The database table contains data about the transaction, or rather, the data about the steps in the transaction. But Logstash jdbc plugin adds each step of the transaction as a new document (for every row in the database). If I can group the documents to be indexed on Elasticsearch by column2, and if I can collect all the step information of the transaction in one document only, I can create the visualization I want, which corresponds to column2 in my example.

I tried Logstash aggregate plugin, but the result not exactly I need. What do you think? How can I get that "one document which includes all the steps info of a transaction" document in a simple and effective way?

Thank you for your help.

Christian_Dahlqvist · November 1, 2017, 8:58am

The reason I asked is that Kibana currently does not support nested documents well and the examples you provided seemed to indicate a nested structure. As I do not know your data nor the different ways you need to query and analyse it, it is hard for me to recommend a structure.

qube · November 1, 2017, 9:04am

Okay then.
Thank you very much for your help anyway!

Christian_Dahlqvist · November 1, 2017, 9:05am

For some types of data you may benefit from storing the raw data (to serve some types of analysis) and also create an entity-centric index for other types of analysis.

Deepti_Jain · November 1, 2017, 3:28pm

You can use jdbcinput plugin and provide the SQLserver query with the row selection criteria in the input section of logstash configuration file to fetch the data.

qube · November 1, 2017, 4:12pm

Thank you Christian_Dahlqvist, I'll check your video!

Deepti_Jain, I don't get it. Do you have a working example?

Topic		Replies	Views
Merge multiple input into one document Logstash	6	12972	May 7, 2018
How to group all the values for a column into an array jdbc mysql Logstash	2	1854	July 6, 2017
Creating the document_id in logstash by combining the two columns coming from the input jdbc filter Logstash	3	12380	July 6, 2017
Logstash JDBC Input - SQL Query Logstash	4	1231	July 6, 2017
How to join fields from multi events into single event? Logstash	5	4588	December 8, 2017

Combining multiple documents based on ID

Related topics