How logstash uploads SQL data

Lian_Jiang · June 15, 2017, 6:40pm

I am trying to understand how logstash design sql data ingestion.

For example, logstash can compare the key of a record to avoid ingesting duplicate records. It can use version to update an existing record.

Does this mean logstash do this?

step 1. pull all records from a table.
step 2. for each record, query the sink if there is already a record having the same key. If yes, the version of this record.
step 3. based on the sink query result, decide whether and how to ingest this request.

Or instead of querying existing records from the sink, does logstash maintain a local database and get the existing records info from the local database?

I am looking into logstash source code but has not got the answer.

Thanks for any help.

magnusbaeck · June 15, 2017, 7:50pm

No, that's not how it works. Logstash never checks for document existence in the output phase. What you typically do is set the document id to a suitable fixed id (e.g. the primary key(s) from the database). That way each update to an existing rows in the source database will be directed to the same document in ES. if the document Logstash is trying to index is identical to what's already there I can only assume that ES does nothing and returns immediately.

Lian_Jiang · June 15, 2017, 8:43pm

Thanks Magus.

This behavior assumes the output (e.g. ES) is compliant with some contract. For example, logstash assumes the output queries whether a record exists by document id, or compares the new record with the existing record having the same document id and decide whether to do nothing or create a new record by version+1.

Since logstash has many output plugins, is there a document describing what contract these output plugins need to comply so that the situation that one source works for one output but not for others does not happen?

If I need to implement an output plugin, any document for the development process? Thanks.

magnusbaeck · June 15, 2017, 8:48pm

Since logstash has many output plugins, is there a document describing what contract these output plugins need to comply so that the situation that one source works for one output but not for others does not happen?

Logstash has no idea whether a particular output supports duplication avoidance. That's entirely up to the plugin. Therefore there is no contract and no documentation.

If I need to implement an output plugin, any document for the development process? Thanks.

Yes, the Logstash documentation contains a section or two on plugin development.

system · July 13, 2017, 8:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Avoiding duplicate records Logstash	4	2459	November 2, 2017
Logstash Update or Insert - logic for updating if existing Logstash	4	14385	April 13, 2017
Logstash and databse Logstash	16	5637	July 6, 2017
Logstash adding duplicate rows for every run Logstash	11	14651	July 6, 2017
Duplication in logstash pipeline (input elasticsearch and output sql database) Logstash	3	226	June 20, 2023

How logstash uploads SQL data

Related topics