Loading log file into ES: poor performance


(John) #1

Hello,

I am loading the existing log file into ES using filebeat. I have a single-node setup.

filebeat.yml:
filebeat.prospectors:

  • paths: [ "/data/test.log" ]

output.elasticsearch:
hosts: ["localhost:9200"]
pipeline: test-pipeline

My pipeline is defined as follows:
"processors" : [
{
"grok" : {
"field": "message",
"patterns": [ "%{TIMESTAMP_ISO8601:date} %{NUMBER:reqtime} %{NUMBER:http
code} %{IP:ip} %{GREEDYDATA:text}" ]
},
"date": {
"field": "date",
"target_field": "@timestamp",
"formats": [ "yyyy-MM-dd HH:mm:ss" ]
},
"date": {
"field": "date",
"target_field": "date",
"formats": [ "yyyy-MM-dd HH:mm:ss" ]
},
"date_index_name" : {
"field" : "date",
"index_name_prefix" : "testidx-",
"date_rounding" : "d"
}
}
]

Log lines are loaded about 100 per second, filebeat process consumes about 2% of CPU, java process (elasticsearch) consumes about 20%-40% of CPU (this is 32 core server).

path.data is on software RAID6, and I see disks participating is this raid6 are about 50% busy (looks pretty high load provided log lines are just small HTTP requests).

Anything special should I tune to speed this process up (well, at least 10x times)?

Thanks.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.