Slower indexing

Beuhlet_Reseau · August 16, 2017, 2:05pm

Hello,

With 5.5 stack version, i find indexing very slow.

2 600 000 messages / 15 minutes.

Impossible for the stack to follow.

Stack : 3 Els cluster (12 cpu-16 Go RAM), 1 logstash (24 cpu-24 Go RAM), 1 kibana

Christian_Dahlqvist · August 16, 2017, 2:12pm

What type of data are you indexing? How large are the documents? How many shards are you actively indexing into? What is the bulk size you are using? How many concurrent indexing threads do you have? Do you have X-Pack monitoring installed? What type of storage do you have?

Beuhlet_Reseau · August 17, 2017, 8:32am

Thank you for your time @Christian_Dahlqvist

In input i have (currently) 2 type of flat document.

First Type : Every 15 minutes i have 8 files (12 Mo each) with 65 000 lines each
=> Total every 15 minutes : 520 000 lines - 96 Mo

Second type : Every 5 minutes i have 24 files (7 Mo each) with 40 000 lines each
=> Total every 5 minutes : 960 000 lines - 128 Mo

Total every 5 minutes : 1 113 300 lines and 160 Mo

For my index's config :

    "index": {
          "number_of_shards": "3",
          "number_of_replicas": "1",
          "refresh_interval": "240s"
          }

For system config of Els and Logstash :

Els :
3 servers cluster with 12 Cpu and 16 Go RAM each. With jvm put at : Xms8g/Xmx8g

Logstash :
1 server with 24 Cpu and 24 Go RAM. with -Xms12g/Xmx12g and config .yml :

pipeline.workers: 24
pipeline.output.workers: 24
pipeline.batch.size: 250
pipeline.batch.delay: 5

No X-Pack installed and data are in local storage.

Christian_Dahlqvist · August 17, 2017, 9:09am

Given the small size of the events that seems like very low throughput. Are you indexing immutable documents or performing updates? What does your Logstash pipeline look like?

Beuhlet_Reseau · August 17, 2017, 1:53pm

I don't understand your question about immutable doc @Christian_Dahlqvist

Log line example look like :

1502977743,ASK_1,REQUEST,7353033,7353033,7353033,2001,FILTOTAL,,0,FILTER,,,CODE5,,,0

In my logsatsh, i take the first field (timestamp to indexing at the good time). I transform just numbers "7353033" in numeric field. Everything else are take as keyword.

Can you give me commands to show you problems ? I had test this command :

GET /edr-dcb*/_search?pretty

{
  "took": 31,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
  }

Took = 31, but i have already about 500 files in directory :/. Maybe It has no relation ?

PS : Also i use start_position => beginning, but it's nothing i think.

Christian_Dahlqvist · August 18, 2017, 9:15am

Are you using time-based, e.g. daily indices? What is the size of your indices? Do you define the document id before indexing them into Elasticsearch?

What does CPU usage and disk IO look like during indexing? Anything in the logs indicating any problems, e.g. long and/or frequent GC?

Beuhlet_Reseau · August 18, 2017, 12:41pm

So @Christian_Dahlqvist, I'm still looking for... I don't know why but, I could see that ; in my directory i see 44,000,000 total lines, in kibana i see 6,000,000 of hits (At the end of treatment)

I use the date in the log lines.

It's about 5 Go per indice. (3 shards and 1 replicas by indice)

Yes, As for the date, I use a particular field of the line to be an id.

Logstash use 500% of CPU (so 5 cpu at 100% on the 24, Little loaded I have the impression
In my Els servers, globally the servers are not loaded (load average : 0,5).

Nothing in logstash log.
On my els servers i have this on :

the first server
[2017-08-17T15:12:18,633][INFO ][o.e.m.j.JvmGcMonitorService] [opm1zels01] [gc][527599] overhead, spent [280ms] collecting in the last [1s]

on the second :
[2017-08-18T13:00:39,570][INFO ][o.e.m.j.JvmGcMonitorService] [opm1zels02] [gc][606003] overhead, spent [325ms] collecting in the last [1s]

on the third : anything

I tried to put stdout debug in logstash output. But how show result ^^ ?

Beuhlet_Reseau · August 18, 2017, 1:47pm

I discovered something...

when removing id => %{field5} in output of logstash, hits in Kibana have gone from 6 000 000 to
14 000 000 and it continues to climb.

I suspect ID of created some deleting of message...

You think what about this :

It's better to use start-position at beginning or end ? It's the same at the end no ?

If you know somes optimiz in Els or Logstash I want to know them (based on your experience and not on documentation, Besides I find it poor)

Christian_Dahlqvist · August 18, 2017, 2:08pm

If that is the case it would seem like you have entries with the same ID, which results in updates, which are basically shown as a delete and a new insert in the statistics.

system · September 15, 2017, 2:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow Indexing Speed Elasticsearch	5	7235	July 6, 2017
Indexing slows down dramatically as index size grows Elasticsearch	4	553	July 6, 2017
Inserts get slower when index become large Elasticsearch	10	474	July 6, 2017
[Solved] Cluster Recommendations? Slow Indexing on Elastic Stack 5 Elasticsearch	9	2760	June 13, 2017
Poll: Indexing speed? Elasticsearch	11	760	July 6, 2017

Slower indexing

Related topics