Index should be re-indexed while being sent from Streamsets. How can i make this automatically?

Hi ,

I'm using 6.4 with Streamsets. Data exported and transported from DB to ES by Streamsets with hourly base. Since my index in ES is set up with nori-tokenizer which cannot be set in Streamsets, reindexing in ES process is mandatory ( I guess...). The question is that is there any way to automate this? Data is updated hourly but cannot make it reflected in the index of ES now.

Welcome!

I'm not sure I understood the question or the problem.

If you want to search for documents in elasticsearch they need to be indexed. Whatever the source or the tool you are using.

Could you clarify your question?

Hi,
Sorry for my insufficient explain. Yes, the documents are indexed when they sent from streamsets but the its settings and mappings are not enough for searching in ES because the documents are Korean, so I would like to change the settings and mappings in ES again using re-index.
https://streamsets.com/documentation/datacollector/3.7.1/help/datacollector/UserGuide/Destinations/Elasticsearch.html#task_uns_gtv_4r

So yes. Define the mapping then reindex the data.

Oh, and no way to automate this process? :fearful:

What do you mean by this?
What do you want to automate?

  1. streamsets send index to ES (let me call index1)
  2. reindex "index1" to "index2"
  3. streamsets send new data in every hour to "index1"
  4. Question: how can be new data in "index1" reflected to "index2" automatically?

Why do you need to reindex? Can't you create an index template that ensures you have the correct mapping from the start?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.