Problem Performance Elasticsearch

Hello,

I have a problem with elasticsearch...

My logs are upload very very slow.

Here the context :

OS : RH 7.1 12 CPU 32 Go RAM.
Advanced config : Memlock true, swap desactivation.

1600 files for a total of 55 000 000 of lines.

I have tested a scenario and, It's upload 20 000 lines per minutes !

It's very poor, i have 55 000 000 lines, so It would take about 45 hours to upload it !

What do you think about that ? @ruflin @Christian_Dahlqvist

How are you uploading? What does your configuration look like? Which version are you on?

So,

Here an example of line :

1490090124,ASR_01,0003diamproxy;14318488;22688;58d0f,REQUEST,47845979,47845979,47845979,20801,0709945330,CCBOS,,0,,,,,,,,0,POS,,,,,,,,,,0,950976,,MAXAGE,4836480,0

Here the pattern i use for it :

if [type] == "log" {
    if [message] =~ "\bCTEName\b" {
     drop { }
    } else {
    csv {
      columns => [ " ...~ about 45 fields " ]
      separator => ";"
         }
           }

   mutate {
        convert => {
             "log_UsedOctets" => "integer"
             "log_TotalOctets" => "integer"
                   }
      remove_field => [ "message", "log_ID", "log_Flag", [...]" ]
           }

I use ELK stack on version 5.1 !

Thank you for your help

Annexe (graph of upload of log) :

It looks like this primarily is a comma separate list. Why use ';' as a separator? Are the events looking as you expect?

What does the rest of your logstash config look like?

I don't understand, each field are separate by ',' so, i indicate just the separator ?

So, for the input :

input {
   beats {
   port => 5044
   ssl => true
   ssl_certificate => "/etc/ssl/logstash-forwarder.crt"
   ssl_key => "/etc/ssl/logstash-forwarder.key"
         }
     }

and Output :

output {
  stdout { codec => rubydebug }
  if [type] == "log" {
     elasticsearch {
            hosts => ["localhost:9200"]
            index => "log--%{+YYYY.MM.dd}"
                    }
                      }
         }

Edit :
Is the ssl which slow parse ?
Maybe, it's not necessary to index each field (i believe it's analyzed) ?

The stdout block can slow down processing throughput. You may also want to increase the internal batch size to see if that has any impact.

If you inspect the events in Discover node, do they look like you expect?

Ok i will try to delete stdout block. Hummm increase the internal batch size ? How :smile: Can it be written to a configuration file for eternity ?

I will see but, to make graphic i must specifiy few fields in integer to do a sum

You can view the raw documents in 'Discover' mode.

Oh thank you a lots @Christian_Dahlqvist

I went from 20 000 to 180 000 lines per minutes. (without modify batch size or other conf) ! :sunny:

I have a question about the configuration :

-b : pipeline.batch.size SIZE : default is 125. On the doc, he talks about passing it around 3,000, what dou think ?

Other conf is :

-w : --pipeline.workers COUNT : default is 5, but i read that the value must be equal to number of core (so 12 in my case)

I see too that the values can be specified in logstash.yml ?:

# pipeline.workers: 12
# pipeline.batch.size: 3000

I will increase size batch 125 to 3 000 and workers to 12 (125 ans 2 previously)

Can you explain me what is options (In a fairly simple way?)

As a reminder, I have only one file in /etc/logstash/conf.d/logstash.conf, and inside i have, for the time being, 2 type of treatment but This will increase.

Have you gone through the following guide for Elasticsearch tuning?

Just a word, perfect. @Christian_Dahlqvist

A last things, the auto generate ID ?

It's by default in elasticsearch or i must activate it somewhere ?

EDIT @Christian_Dahlqvist :

https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html#_use_the_smallest_numeric_type_that_is_sufficient

I ask myself a question about the use of integer type. In the logstash's filter, If I do not define field type, elastic store it as a string. If i use convert => "integer", it's store as a long.

(If i put => "short", logstash return me an error... grrrr)

If you do not explicitly provide an id, one will be auto-generated for you.

This is all handled through index templates in Elasticsearch.

@Christian_Dahlqvist

I tried to update setting by the elasticsearch.yml file but at start ES return this error :

Found index level settings on node level configuration.
Since elasticsearch 5.x index level settings can NOT be set on the nodes configuration like the elasticsearch.yaml

So, i understand that settings must be udapte by PUT commands.

(I have not yet created an index, i purged the BDD to start from 0)

I have succeeded in modify number_of_replicas and refresh_interval by this command line in linux shell :

curl -XPUT 'localhost:9200/_settings' -d '
{
 "index" : {
 "refresh_interval" : -1
           }
}'

It's ok, it return me acknoledge true.

But I tested to surrender refresh interval to 30 but i have this message :
("refresh_interval" : 30)

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"failed to parse setting [index.refresh_interval]

Why ?

Also, i tested to set memory buffer size to 1,2GB with :

"memory.index_buffer_size" : 1,2G

It's return me :

{"error":{"root_cause":[{"type":"settings_exception","reason":"Failed to load settings

It's complicated ....

What you tried in your latest post do not seem to match anything described in the link I provided. Did you read it?

This is however a completely different question, unrelated to the initial performance issue, so I would recommend creating a new issue instead of continuing discussing various topics in this thread. That will make it a lot easier for others to find appropriate topics when searching the forum.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.