Ingestion pipeline

Hi,

I am looking at ingestion pipeline using logstash into elasticsearch.

Source data comes from sql database (MySQL)

I have used the jdbc input plugin for logstash which is fantastic.

I have a question:
I want to schedule logstash to index the data on a recurring basis. (aka Update existing index) I know that the jdbc plugin has this capability already.

Is this a reindex or new index of the data?
My scheduler prob only has to run once a month.

I have used aliases before and found them useful to atomically switch from one index to the next. I understand logstash does not support that. I would have to use the curator.

So...

Do i simply just go with scheduler or look at using an alias!?

The scheduler will pull any new data it finds, based on the sql_last_value.
You can configure things so that you generate a custom document ID so that it'll update things in ES though.

thanks for the reply mark.

  1. If there is new records in the database are they bulk indexed into the existing index?
  2. if there is updates to existing records in database, are the existing documents updated in existing index?
  3. same for removed records...

thanks,
shane.

I found some interesting topics by searching for sql_last_value.

This post below demonstrates using sql_last_value and how to actually update documents.
As you said, it might be best to use mysql primary key as document id to handle any updates.

Only issue I have with that is I remember reading if you let elasticsearch handle document id the bulk index is much faster?

The only action left then is delete. If record has been removed, will logstash handle that also?

Regards,
Shane.

Yes, but it's a trade off.

No, only updates.

So what do you do in this circumstance?

is this a feature requested by users out of interest?

Well, LS isn't a state machine so it doesn't track this sort of thing to allow deletes.

You could just reindex everything into a new index every time, then delete the old one.

No worries.

Which brings me back to my initial question...

Should i use logstash scheduler to create a new index everytime? Switch the alias across and delete the old index?

Sounds like a good idea given your use case!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.