Sizing of an ELK node

Henry_N · April 5, 2018, 8:23pm

I have a single ELKnode with 300GB of storage. I downloaded 30GB of load balancer logs to be processed in ELK. I'm expecting 90million requests but so far there's only 27million hits, indexing has stopped and disk usage is now up to 272GB. Is this typical? How much storage do I really need?

Christian_Dahlqvist · April 5, 2018, 8:27pm

How are you indexing them? How long time period do the logs cover? How many indices and shards is the data stored across? Are you using default mappings? Which version of Elasticsearch are you using?

Henry_N · April 5, 2018, 8:41pm

The logs only cover 24 hours of elb/webserver logs. I'm new at this, I installed all ELK components following a tutorial. I did find the bulk of disk consumption is on /var/log/syslog.1 . Is it safe to delete it or is that still being used by Elasticsearch?

"version" : {
"number" : "6.2.3",
"build_hash" : "c59ff00",
"build_date" : "2018-03-13T10:06:29.741383Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},

{
"cluster_name" : "elk-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 6,
"active_shards" : 6,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 54.54545454545454
}

Christian_Dahlqvist · April 5, 2018, 8:44pm

Can you provide a raw event together with an indexed one and the mappings of the index?

Henry_N · April 5, 2018, 9:18pm

Here is an example of an entry along w/ my filter setting. Is this what you meant by 'mappings'?

2015-05-13T23:39:43.945958Z my-loadbalancer 192.168.131.39:2817 10.0.0.1:80 0.000086 0.001048 0.001337 200 200 0 57 "GET https://www.example.com:443/ HTTP/1.1" "curl/7.38.0" DHE-RSA-AES128-SHA TLSv1.2

filter {
if [type] == "elb" {
grok {
match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:loadbalancer} %{IP:client_ip}:%{NUMBER:client_port:int} (?:%{IP:backend_ip}:%{NUMBER:backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:%{NUMBER:elb_status_code:int}|-) (?:%{NUMBER:backend_status_code:int}|-) %{NUMBER:received_bytes:int} %{NUMBER:sent_bytes:int} "(?:%{WORD:verb}|-) (?:%{GREEDYDATA:request}|-) (?:HTTP/%{NUMBER:httpversion}|-( )?)" "%{DATA:userAgent}"( %{NOTSPACE:ssl_cipher} %{NOTSPACE:ssl_protocol})?"]
}
grok {
match => ["request", "%{URIPROTO:http_protocol}"]
}
if [request] != "-" {
grok {
match => ["request", "(?[^?]*)"]
overwrite => ["request"]
}
}
geoip {
source => "client_ip"
target => "geoip"
add_tag => ["geoip"]
}
useragent {
source => "userAgent"
}
date {
match => ["timestamp", "ISO8601"]
}
}
}

I went ahead and deleted the large syslog.1 and rebooted the node. It doesn't seem to continue with the indexing. How do I resume operation? I'd hate to restart over.

Also, if I have to restart over, how do I set to logs to grow only up to a certain size and delete the oldest Is there a way to continue from the existing index? Thanks for your help!

Christian_Dahlqvist · April 6, 2018, 8:20am

This blog post discusses how enrichment and mappings affect storage size for data not dissimilar to yours and might be useful.

By default Elasticsearch mappings are designed to give you a lot of flexibility, and this flexibility will take up space (although your ration seems quite excessive). If you go through your mappings and optimise them you can save a lot of space.

system · May 4, 2018, 8:20am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster configuration for log storage. 140Gb/day Elasticsearch	5	3421	November 10, 2017
How is log storage working? Logstash	2	977	July 6, 2017
Data per node in ES Elasticsearch	3	695	July 6, 2017
Log size issue Elasticsearch	4	525	September 5, 2018
Unexpected large disk usage in ES Elasticsearch	2	1075	March 27, 2017

Sizing of an ELK node

Related topics