How Can I increase ES's indexing Data speed?Bulk can't achieve it!

LDA-MAN · December 15, 2015, 5:21am

i use bulkIndex or BulkProcessor Index Data ,and with 5 thread! my data are one milion items and 700M;
this takes two hours to index!
i think it's too long,how can better it ? how can i index faster?

Christian_Dahlqvist · December 15, 2015, 6:49am

In order for anyone to be able to help you, you will need to provide more details. How are you performing your bulk indexing? What is the size of your bulk requests? What is the size and structure of your documents? What does your cluster look like? What version of Elasticsearch are you using? How many indices and shards are you indexing into?

LDA-MAN · December 15, 2015, 8:59am

sorry,!
my elastticsearch version is 1.7.
my cluster are two nodes,and i query data from orcale and foreach it
this is my code：
public static void ProceIndex(Client client, List plist){
BulkProcessor bulkProcessor = BulkProcessor.builder(client,
new BulkProcessor.Listener() {
long time1=System.currentTimeMillis();
long time2=0;
@Override
public void afterBulk(long arg0, BulkRequest arg1,
BulkResponse arg2) {
System.out.println("afterBulk bulk-----");
time2=System.currentTimeMillis();
System.out.println("总共索引:"+"条数据,耗时:"+(time2-time1)/1000+"秒----");
}
@Override
public void afterBulk(long arg0, BulkRequest arg1,
Throwable arg2) {
System.out.println("afterBulk bulk-----");
}
@Override
public void beforeBulk(long arg0, BulkRequest arg1) {
System.out.println("before bulk-----");
time1=System.currentTimeMillis();
}
})
.setBulkActions(10000)
.setBulkSize(new ByteSizeValue(50, ByteSizeUnit.MB))
.setFlushInterval(TimeValue.timeValueSeconds(5))
.setConcurrentRequests(10)
.build();
for(int i=0;i<plist.size();i++){
IndexRequest index=new IndexRequest("dfinder_perio", "perio");
bulkProcessor.add(index.source(JsonUtil.toJson(plist.get(i))).id(plist.get(i).getArticleId()));
}

Christian_Dahlqvist · December 15, 2015, 9:27am

I would recommend lowering the size of the bulk requests. Large bulk requests does not necessarily result in improved performance. Try setting it to a few thousand documents and a maximum size of around 5MB.

When you are indexing, what does the cluster look like? Are you saturating disk IO or possibly CPU? Do you see a lot of garbage collection occuring in the Elasticsearch logs? One trick to improve indexing performance for a temporary bulk load is to set the number of replisas to 0 during indexing and then increase it again once indexing has completed. This results in reduced load on the cluster during indexing at the expense of durability, but this can be a good tradeoff.

LDA-MAN · December 15, 2015, 9:33am

Thanks!
i tried to set replisas to 0 ,but is seems to not work really.
my yml config like this:
#index.analysis.analyzer.default.type : "ik"
cluster.name: elasticsearch
node.name: "node2"
transport.tcp.port: 9302
"number_of_replicas": "0"
"index.refresh_interval": "-1"

Christian_Dahlqvist · December 15, 2015, 9:38am

Do not change the default values in the Elasticsearch config file to achieve this. Instead update theindex settings through the API. Change replicas to 0 for the index prior to starting the bulk job and then set it back to the default value once the job has finished.

LDA-MAN · December 15, 2015, 10:02am

Thanks for your advise.i just update the bulkActions and BulkSize and number_of_replicas, now my indexing seems like this,this size in the head page is changing when i refesh the page,but the docs is 0.?
is there any reason for it ?is it only show when the processing end ？

thanks!

LDA-MAN · December 15, 2015, 10:03am

and the logs in the console show that the work is indexing.
before bulk-----
当前页:163
afterBulk bulk-----
总共索引:,耗时:0秒----
before bulk-----
afterBulk bulk-----
总共索引:,耗时:0秒----

LDA-MAN · December 15, 2015, 10:21am

Thanks!i update this ,but seems not better!

Christian_Dahlqvist · December 15, 2015, 10:37am

How is the cluster looking during indexing? Is there anything in the logs? How does CPU and disk IO look?

If performance does not increase when setting replicas to 0 and there is no identifiable factor limiting performance, I would recommend performing a separate indexing benchmark with similar documents, e.g. using Logstash with a file input, to see what the limit of the cluster is and make sure it is not the source system that is limiting throughput.

LDA-MAN · December 15, 2015, 10:48am

mybe in the for cycle,it query date from orcale DB,i select it every page and pagesize is 5000 items,my total datas is
26000000 items. select one page takes 100 seconds.
do you have any idea for deal it ?

Christian_Dahlqvist · December 15, 2015, 11:01am

If Oracle is the bottleneck I am afraid I will not be able to help.

Topic		Replies	Views
Indexing Performance, Threads + Bulk Size Elasticsearch	2	407	July 6, 2017
Bulk indexing slow down when data amount increase Elasticsearch	6	2990	July 6, 2017
Bulk indexing size? Elasticsearch	5	356	July 6, 2017
How to increase indexing speed? Elasticsearch	5	5439	April 18, 2017
Improving Bulk Indexing Elasticsearch	12	4572	July 6, 2017

How Can I increase ES's indexing Data speed?Bulk can't achieve it!

Related topics