Problem Performance Elasticsearch

Beuhlet_Reseau · March 22, 2017, 5:53pm

Hello,

I have a problem with elasticsearch...

My logs are upload very very slow.

Here the context :

OS : RH 7.1 12 CPU 32 Go RAM.
Advanced config : Memlock true, swap desactivation.

1600 files for a total of 55 000 000 of lines.

I have tested a scenario and, It's upload 20 000 lines per minutes !

It's very poor, i have 55 000 000 lines, so It would take about 45 hours to upload it !

What do you think about that ? @ruflin @Christian_Dahlqvist

Christian_Dahlqvist · March 22, 2017, 6:00pm

How are you uploading? What does your configuration look like? Which version are you on?

Beuhlet_Reseau · March 23, 2017, 9:17am

So,

Here an example of line :

1490090124,ASR_01,0003diamproxy;14318488;22688;58d0f,REQUEST,47845979,47845979,47845979,20801,0709945330,CCBOS,,0,,,,,,,,0,POS,,,,,,,,,,0,950976,,MAXAGE,4836480,0

Here the pattern i use for it :

if [type] == "log" {
    if [message] =~ "\bCTEName\b" {
     drop { }
    } else {
    csv {
      columns => [ " ...~ about 45 fields " ]
      separator => ";"
         }
           }

   mutate {
        convert => {
             "log_UsedOctets" => "integer"
             "log_TotalOctets" => "integer"
                   }
      remove_field => [ "message", "log_ID", "log_Flag", [...]" ]
           }

I use ELK stack on version 5.1 !

Thank you for your help

Annexe (graph of upload of log) :

Christian_Dahlqvist · March 23, 2017, 9:24am

It looks like this primarily is a comma separate list. Why use ';' as a separator? Are the events looking as you expect?

What does the rest of your logstash config look like?

Beuhlet_Reseau · March 23, 2017, 10:42am

I don't understand, each field are separate by ',' so, i indicate just the separator ?

So, for the input :

input {
   beats {
   port => 5044
   ssl => true
   ssl_certificate => "/etc/ssl/logstash-forwarder.crt"
   ssl_key => "/etc/ssl/logstash-forwarder.key"
         }
     }

and Output :

output {
  stdout { codec => rubydebug }
  if [type] == "log" {
     elasticsearch {
            hosts => ["localhost:9200"]
            index => "log--%{+YYYY.MM.dd}"
                    }
                      }
         }

Edit :
Is the ssl which slow parse ?
Maybe, it's not necessary to index each field (i believe it's analyzed) ?

Christian_Dahlqvist · March 23, 2017, 10:54am

The stdout block can slow down processing throughput. You may also want to increase the internal batch size to see if that has any impact.

If you inspect the events in Discover node, do they look like you expect?

Beuhlet_Reseau · March 23, 2017, 2:32pm

Ok i will try to delete stdout block. Hummm increase the internal batch size ? How Can it be written to a configuration file for eternity ?

I will see but, to make graphic i must specifiy few fields in integer to do a sum

Christian_Dahlqvist · March 23, 2017, 2:40pm

You can view the raw documents in 'Discover' mode.

Beuhlet_Reseau · March 23, 2017, 4:10pm

Oh thank you a lots @Christian_Dahlqvist

I went from 20 000 to 180 000 lines per minutes. (without modify batch size or other conf) !

I have a question about the configuration :

-b : pipeline.batch.size SIZE : default is 125. On the doc, he talks about passing it around 3,000, what dou think ?

Other conf is :

-w : --pipeline.workers COUNT : default is 5, but i read that the value must be equal to number of core (so 12 in my case)

I see too that the values can be specified in logstash.yml ?:

# pipeline.workers: 12
# pipeline.batch.size: 3000

I will increase size batch 125 to 3 000 and workers to 12 (125 ans 2 previously)

Can you explain me what is options (In a fairly simple way?)

As a reminder, I have only one file in /etc/logstash/conf.d/logstash.conf, and inside i have, for the time being, 2 type of treatment but This will increase.

Christian_Dahlqvist · March 23, 2017, 4:20pm

Have you gone through the following guide for Elasticsearch tuning?

Beuhlet_Reseau · March 23, 2017, 10:45pm

Just a word, perfect. @Christian_Dahlqvist

Beuhlet_Reseau · March 24, 2017, 1:04pm

A last things, the auto generate ID ?

It's by default in elasticsearch or i must activate it somewhere ?

EDIT @Christian_Dahlqvist :

https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html#_use_the_smallest_numeric_type_that_is_sufficient

I ask myself a question about the use of integer type. In the logstash's filter, If I do not define field type, elastic store it as a string. If i use convert => "integer", it's store as a long.

(If i put => "short", logstash return me an error... grrrr)

Christian_Dahlqvist · March 24, 2017, 1:11pm

If you do not explicitly provide an id, one will be auto-generated for you.

Christian_Dahlqvist · March 24, 2017, 1:29pm

This is all handled through index templates in Elasticsearch.

Beuhlet_Reseau · March 24, 2017, 3:54pm

@Christian_Dahlqvist

I tried to update setting by the elasticsearch.yml file but at start ES return this error :

Found index level settings on node level configuration.
Since elasticsearch 5.x index level settings can NOT be set on the nodes configuration like the elasticsearch.yaml

So, i understand that settings must be udapte by PUT commands.

(I have not yet created an index, i purged the BDD to start from 0)

I have succeeded in modify number_of_replicas and refresh_interval by this command line in linux shell :

curl -XPUT 'localhost:9200/_settings' -d '
{
 "index" : {
 "refresh_interval" : -1
           }
}'

It's ok, it return me acknoledge true.

But I tested to surrender refresh interval to 30 but i have this message :
("refresh_interval" : 30)

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"failed to parse setting [index.refresh_interval]

Why ?

Also, i tested to set memory buffer size to 1,2GB with :

"memory.index_buffer_size" : 1,2G

It's return me :

{"error":{"root_cause":[{"type":"settings_exception","reason":"Failed to load settings

It's complicated ....

Christian_Dahlqvist · March 25, 2017, 8:50am

What you tried in your latest post do not seem to match anything described in the link I provided. Did you read it?

This is however a completely different question, unrelated to the initial performance issue, so I would recommend creating a new issue instead of continuing discussing various topics in this thread. That will make it a lot easier for others to find appropriate topics when searching the forum.

system · April 22, 2017, 8:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch slow performance Elasticsearch	8	2875	July 5, 2017
Slow Data loading to elasticsearch Logstash	15	5236	July 13, 2017
Logstash is very slow in sending the data to elasticsearch Logstash	18	13926	February 21, 2017
Elasticsearch is getting very slow Elasticsearch	2	349	July 6, 2017
Why am I getting such poor performance? Logstash	14	599	December 18, 2018

Problem Performance Elasticsearch

Related topics