Indexing is freeze any time while indexing the 5 crore data

Sweta_Singh · March 14, 2019, 4:57am

Hi,
Actually i am indexing 5 crore data into my vm ware elastic server from SQL server using log-stash ,creating the 9 index with 1 primary shard and 1 replica shard and running the Multipipeline in pipeline.yml file and each pipeline has 3 worker and other setting is default.

my problem which is facing that indexing freeze any time whether data size is 1 Lac or 75 Lac. i tried a lot even i used the single multipipeline for single indexed but still the same issue occured.

I had tried the logstash heap memory more than 1GB(default) to 4 GB(after changing) but still the same issue.

i also change the heap memory of Elasticsearh 1GB(default) to 6 GB but still getting the same issue.

Please suggest the optimal resolution as soon as possible.

my pipeline is :

-pipeline id :abc
pipeline worker:3
path.config: xyz

Christian_Dahlqvist · March 14, 2019, 6:51am

So you are looking to index 50,000,000 documents? What size are these documents? What does your pipeline look like? What is the specification of the hardware your Elasticsearch cluster is running on? Are there any issues reported in the Elasticsearch logs?

Sweta_Singh · March 14, 2019, 8:31am

yes, each documents have different number of fields(column of sql server) in one document (min 50 fields or max 570 fields) or 28 lac records are holding 1GB data .

my pipeline define in pipeline.yml file showing below

pipeline.id: id1
pipeline.workers: 3
path.config: "MULTIPIP/ABC/MO/abc.config"

#pipeline.id: id2
#pipeline.workers: 3
#path.config:"MULTIPIP/ABC/MO/abc1.config"
............and so on

Elastic server (running on windows 10) has one single node:
Hardware configuration;
RAM-64 GB dynamic
HDD : 200 GB(where Elastic search and log-stash is running on same system and drive)
Processor: Intel(R) Xeon (R) E5-4610 v2 @ 2.30 GHz
System Type: 64- bit operating system x64 based processor

Elasticsearch logs showing :
GC (Allocation Failure) 2019-03-08T17:26:58.442+0530: 243.065: [ParNew Desired survivor size 17891328 bytes, new threshold 4 (max 6) - age 1: 12015200

Kindly also explain in detail the heap memory internal processing of both Elastic-search and log-stash.

Christian_Dahlqvist · March 14, 2019, 8:39am

Try to identify what is limiting performance. Is CPU maxes out or very high during indexing? Do you have slow storage resulting in low I/O throughput and high iowait?

Sweta_Singh · March 14, 2019, 8:56am

No , not like that .
but as checking the (Analyze Wait Chain) showing one or more threads of java.exe are waiting to finish network I/O .

Christian_Dahlqvist · March 14, 2019, 8:58am

Have you looked at disk performance? As you have a HDD that could very well be the limiting factor as indexing is I/O intensive.

Sweta_Singh · March 14, 2019, 9:38am

Disk performance showing Disk transfer rate:100 to 450kb/s and active time :100 %.

Is anything which should i check?

Please give me the resolution as soon as possible .

Christian_Dahlqvist · March 14, 2019, 9:40am

If you are using Linux, what does iostat -x show?

You will also need to be patient. This forum is manned by volunteers so there is no SLA or even a guarantee to get a solution. If you need answers within SLA Elastic offer commercial subscriptions that provide this.

Sweta_Singh · March 15, 2019, 11:58am

Hi,
Can anyone help me that log-stash pipeline has been terminated without complete indexing Now i am getting this error .
[2019-03-15T17:11:45,369][WARN ][logstash.inputs.jdbc ] Exception when executing JDBC query {:exception=>#<Sequel::DatabaseError: Java::ComMicrosoftSqlserverJdbc::SQLServerException: Connection reset>}
[2019-03-15T17:12:04,967][INFO ][logstash.pipeline ] Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x2cd5473d@A:/fresh_L/log-6.4.0/logstash-core/lib/logstash/pipeline_action/create.rb:46 run>"}

why it is happening .please suggest something .

system · April 12, 2019, 11:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing in Elasticsearch stops after particular number Elasticsearch	11	2393	December 6, 2017
Gc overhead while index data using logstash Elasticsearch	7	1293	April 19, 2019
Indexing stops at low number of records due to GC overhead with proper heap settings Elasticsearch	10	1531	April 6, 2018
Slower indexing Elasticsearch	9	672	September 15, 2017
ElasticSearch performance trouble when indexing data Elasticsearch	11	4627	April 28, 2021

Indexing is freeze any time while indexing the 5 crore data

Related topics