I'm new to logstash and using the latest version. I have a config file now where I've defined my input/output and it pulls data from a single SQL Server table successfully and populates my index in elasticsearch. Now I want to pull in some child records and I'm a bit confused on how to setup the config file(s)
If I want to pull in child records can I do this in the same config file or do I need to create an additional config file that pulls the child records and then in that config setup the reference to the parent? I guess I'm trying to determine can this all be done in a single config or do I need to first load the parent records in one config file, and then in another config file load all child records and at that point map them in logstash config to the parent field?
As Mark says it doesn't matter how you organize your configuration files, but I'm not sure how you're going to carry out the child record loading at all. It's not something Logstash supports generically, but depending on what data source(s) you have it might be possible to do.
My goal is to load related records from my database server to an elasticsearch index. In my database (SQL Server) I have tables that have one-to-many relationships. I was hoping to use logstash to populate my elasticsearch index and somehow maintain those relationships. I was under the impression this can be done in a logstash config file. Is this not the case @magnusbaeck? I was under the impression there is a way to do it.
It's not clear that use of parent/child documents is the best approach (as opposed to relational databases, Elasticsearch is usually used to store denormalized documents), but you might be able to string something together. The elasticsearch output supports a parent option that lets you select the parent document of a new document being indexed, so it should be possible to first index all the parents and then do a second pass for the children. Just make sure the SQL query that extracts the child records selects the parent id column.
Thanks for the reply. I agree, storing denormalized documents seems more ideal, but there is some of content on the elastic site around parent-child relationships so I thought it would be a viable approach and wanted to leverage logstash as that's what I'm using to pull data in. I will look into using the "parent" option as you mentioned.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.