Indexing is freeze any time while indexing the 5 crore data

(Sweta Singh) #1

Actually i am indexing 5 crore data into my vm ware elastic server from SQL server using log-stash ,creating the 9 index with 1 primary shard and 1 replica shard and running the Multipipeline in pipeline.yml file and each pipeline has 3 worker and other setting is default.

my problem which is facing that indexing freeze any time whether data size is 1 Lac or 75 Lac. i tried a lot even i used the single multipipeline for single indexed but still the same issue occured.

I had tried the logstash heap memory more than 1GB(default) to 4 GB(after changing) but still the same issue.

i also change the heap memory of Elasticsearh 1GB(default) to 6 GB but still getting the same issue.

Please suggest the optimal resolution as soon as possible.

my pipeline is :

-pipeline id :abc
pipeline worker:3
path.config: xyz

(Christian Dahlqvist) #2

So you are looking to index 50,000,000 documents? What size are these documents? What does your pipeline look like? What is the specification of the hardware your Elasticsearch cluster is running on? Are there any issues reported in the Elasticsearch logs?

(Sweta Singh) #3

yes, each documents have different number of fields(column of sql server) in one document (min 50 fields or max 570 fields) or 28 lac records are holding 1GB data .

my pipeline define in pipeline.yml file showing below

  • id1
    pipeline.workers: 3
    path.config: "MULTIPIP/ABC/MO/abc.config" id2
#pipeline.workers: 3
............and so on

Elastic server (running on windows 10) has one single node:
Hardware configuration;
RAM-64 GB dynamic
HDD : 200 GB(where Elastic search and log-stash is running on same system and drive)
Processor: Intel(R) Xeon (R) E5-4610 v2 @ 2.30 GHz
System Type: 64- bit operating system x64 based processor

Elasticsearch logs showing :
GC (Allocation Failure) 2019-03-08T17:26:58.442+0530: 243.065: [ParNew Desired survivor size 17891328 bytes, new threshold 4 (max 6) - age 1: 12015200

Kindly also explain in detail the heap memory internal processing of both Elastic-search and log-stash.

(Christian Dahlqvist) #4

Try to identify what is limiting performance. Is CPU maxes out or very high during indexing? Do you have slow storage resulting in low I/O throughput and high iowait?

(Sweta Singh) #5

No , not like that .
but as checking the (Analyze Wait Chain) showing one or more threads of java.exe are waiting to finish network I/O .

(Christian Dahlqvist) #6

Have you looked at disk performance? As you have a HDD that could very well be the limiting factor as indexing is I/O intensive.

(Sweta Singh) #7

Disk performance showing Disk transfer rate:100 to 450kb/s and active time :100 %.

Is anything which should i check?

Please give me the resolution as soon as possible .

(Christian Dahlqvist) #8

If you are using Linux, what does iostat -x show?

You will also need to be patient. This forum is manned by volunteers so there is no SLA or even a guarantee to get a solution. If you need answers within SLA Elastic offer commercial subscriptions that provide this.

(Sweta Singh) #9

Can anyone help me that log-stash pipeline has been terminated without complete indexing Now i am getting this error .
[2019-03-15T17:11:45,369][WARN ][logstash.inputs.jdbc ] Exception when executing JDBC query {:exception=>#<Sequel::DatabaseError: Java::ComMicrosoftSqlserverJdbc::SQLServerException: Connection reset>}
[2019-03-15T17:12:04,967][INFO ][logstash.pipeline ] Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x2cd5473d@A:/fresh_L/log-6.4.0/logstash-core/lib/logstash/pipeline_action/create.rb:46 run>"}

why it is happening .please suggest something .