We are transferring data from Sql Server to ElasticSearch using LogStash's jdbc plugin. So far, it's fine. Normally jdbc plugin creates a new document for each Sql Server row, but I want to create a new document by combining these multiple documents based on ID column. How can I do that?
Our data like:
id column1 column2
1 A xyz
2 B xxx
3 C yyy
4 D xyz
5 E xyz
I want to combine 1, 4, and 5 records in a new document based on column2, xyz. Records must be added to the new document according to the date order in Sql Server. I can do this using order by, but I need help in creating a new document.
We can create a new document from records already on ElasticSearch, or we can create a new document based on column2 when the records pass through LogStash, so there won't be extra record on ElasticSearch.
I think the id column is confusing. It does not matter what the id is. id may be anything unique from 1, 4, and 5.
Does column2 mean that it can not be combined into a single document because it is a repeating column?
Is not it possible with LogStash to get the following document?
{
_id: unique_value (I can hold this new document with Elasticsearch's _id column)
column1: A
column2: xyz
id: 4
column1: D
column2: xyz
id: 5
column1: E
column2: xyz
}
elasticsearch filter plugin can help, or aggregate plugin? I read about them, but I need some help how to do..
I'm not sure how json format should be, but I'm expecting to query with grouped column, column2: xyz in the example. So, my problem is here certain, to get all the documents into a new one big document according to the column2 value. How can I do that Christian, with correct json format?
I'm going to use Kibana, trying to create a datatable or chart in Kibana using relational database data.
The database table contains data about the transaction, or rather, the data about the steps in the transaction. But Logstash jdbc plugin adds each step of the transaction as a new document (for every row in the database). If I can group the documents to be indexed on Elasticsearch by column2, and if I can collect all the step information of the transaction in one document only, I can create the visualization I want, which corresponds to column2 in my example.
I tried Logstash aggregate plugin, but the result not exactly I need. What do you think? How can I get that "one document which includes all the steps info of a transaction" document in a simple and effective way?
The reason I asked is that Kibana currently does not support nested documents well and the examples you provided seemed to indicate a nested structure. As I do not know your data nor the different ways you need to query and analyse it, it is hard for me to recommend a structure.
For some types of data you may benefit from storing the raw data (to serve some types of analysis) and also create an entity-centric index for other types of analysis.
You can use jdbcinput plugin and provide the SQLserver query with the row selection criteria in the input section of logstash configuration file to fetch the data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.