Ingestion pipeline

shane_lee · March 18, 2017, 7:54am

Hi,

I am looking at ingestion pipeline using logstash into elasticsearch.

Source data comes from sql database (MySQL)

I have used the jdbc input plugin for logstash which is fantastic.

I have a question:
I want to schedule logstash to index the data on a recurring basis. (aka Update existing index) I know that the jdbc plugin has this capability already.

Is this a reindex or new index of the data?
My scheduler prob only has to run once a month.

I have used aliases before and found them useful to atomically switch from one index to the next. I understand logstash does not support that. I would have to use the curator.

So...

Do i simply just go with scheduler or look at using an alias!?

warkolm · March 19, 2017, 10:53pm

The scheduler will pull any new data it finds, based on the sql_last_value.
You can configure things so that you generate a custom document ID so that it'll update things in ES though.

shane_lee · March 20, 2017, 12:14am

thanks for the reply mark.

If there is new records in the database are they bulk indexed into the existing index?
if there is updates to existing records in database, are the existing documents updated in existing index?
same for removed records...

thanks,
shane.

shane_lee · March 20, 2017, 12:20pm

I found some interesting topics by searching for sql_last_value.

This post below demonstrates using sql_last_value and how to actually update documents.
As you said, it might be best to use mysql primary key as document id to handle any updates.

Only issue I have with that is I remember reading if you let elasticsearch handle document id the bulk index is much faster?

The only action left then is delete. If record has been removed, will logstash handle that also?

Regards,
Shane.

warkolm · March 20, 2017, 8:59pm

Yes, but it's a trade off.

No, only updates.

shane_lee · March 21, 2017, 1:09am

So what do you do in this circumstance?

is this a feature requested by users out of interest?

warkolm · March 21, 2017, 5:41am

Well, LS isn't a state machine so it doesn't track this sort of thing to allow deletes.

You could just reindex everything into a new index every time, then delete the old one.

shane_lee · March 22, 2017, 1:16am

No worries.

Which brings me back to my initial question...

Should i use logstash scheduler to create a new index everytime? Switch the alias across and delete the old index?

warkolm · March 22, 2017, 2:09am

Sounds like a good idea given your use case!

system · April 19, 2017, 2:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
MySQL to Elasticsearch Logstash	26	8171	July 6, 2017
JDBC Input \| Statement Help Logstash	3	824	December 18, 2017
Updating data which are newly added Logstash	3	2122	April 25, 2017
Logstash JDBC plugin question Logstash	2	1161	July 6, 2017
Updating Elasticsearch index with SQL data Logstash	1	341	November 15, 2018

Ingestion pipeline

Related topics