Which way to collect data into Elasticsearch is better?

kuang1987 · December 3, 2015, 2:34pm

Hi Elasticsearch gurus:

As a fresh bird to elasticsearch, I'm trying to arch a data collection/analysis/visualization center for our e-commercial company.

I've done some research and want to seek some advice here.

I tried three way to simulate importing data (10K documents).

use logstash with redis input and elasticsearch output
use Python api -- elasticsearch.index()
use Python helpers -- helpers.bulk()

The result shows:
in way 1/3 -- it costs less than about 8s. I guess they are same way actually.
way 2 -- costs about 80s.

then, I enhance way 2 with python threading:
when using 100 threads -- about 30s
when using 200 threads -- 'elasticsearch rejection' exception occurs in some threads.

In my scenario, I need some anchor codes in existing systems to send events/data to elasticsearch. Based on my research, bulk way will be better than api (actually, I guess all elasticsearch clients API are based on http RESTFUL) since former one has a higher TPS than latter.

So I think better solution is anchor code throw data into redis or some supported mqs, and logstash consumes them, finally insert into elasticsearch.

Can anyone give me some advice whether it's the right direction? Or is there better solution?

Cylindric · December 3, 2015, 3:31pm

Can you edit your post so it's not in a blockquote, so I don't have to scroll to read all the lines? Take the spaces out from in front of all the lines...

kuang1987 · December 4, 2015, 1:59am

Thanks for your kind reminder!

Topic		Replies	Views
What is the most efficient way to insert data to Elasticsearch? Elasticsearch	6	3309	July 18, 2018
What is the best way to insert/update billion data to ES from API? Elasticsearch	5	439	January 9, 2019
Best way to insert data into Elasticsearch Elasticsearch	6	1882	May 5, 2017
Using Logstash as a bulk index buffer to ES Logstash	14	5123	January 9, 2017
Options to index data in ES Elasticsearch	3	388	July 6, 2017

Which way to collect data into Elasticsearch is better?

Related topics