Sizing of an ELK node

I have a single ELKnode with 300GB of storage. I downloaded 30GB of load balancer logs to be processed in ELK. I'm expecting 90million requests but so far there's only 27million hits, indexing has stopped and disk usage is now up to 272GB. Is this typical? How much storage do I really need?

How are you indexing them? How long time period do the logs cover? How many indices and shards is the data stored across? Are you using default mappings? Which version of Elasticsearch are you using?

The logs only cover 24 hours of elb/webserver logs. I'm new at this, I installed all ELK components following a tutorial. I did find the bulk of disk consumption is on /var/log/syslog.1 . Is it safe to delete it or is that still being used by Elasticsearch?

"version" : {
"number" : "6.2.3",
"build_hash" : "c59ff00",
"build_date" : "2018-03-13T10:06:29.741383Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},

{
"cluster_name" : "elk-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 6,
"active_shards" : 6,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 54.54545454545454
}

Can you provide a raw event together with an indexed one and the mappings of the index?

Here is an example of an entry along w/ my filter setting. Is this what you meant by 'mappings'?

2015-05-13T23:39:43.945958Z my-loadbalancer 192.168.131.39:2817 10.0.0.1:80 0.000086 0.001048 0.001337 200 200 0 57 "GET https://www.example.com:443/ HTTP/1.1" "curl/7.38.0" DHE-RSA-AES128-SHA TLSv1.2

filter {
if [type] == "elb" {
grok {
match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:loadbalancer} %{IP:client_ip}:%{NUMBER:client_port:int} (?:%{IP:backend_ip}:%{NUMBER:backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:%{NUMBER:elb_status_code:int}|-) (?:%{NUMBER:backend_status_code:int}|-) %{NUMBER:received_bytes:int} %{NUMBER:sent_bytes:int} "(?:%{WORD:verb}|-) (?:%{GREEDYDATA:request}|-) (?:HTTP/%{NUMBER:httpversion}|-( )?)" "%{DATA:userAgent}"( %{NOTSPACE:ssl_cipher} %{NOTSPACE:ssl_protocol})?"]
}
grok {
match => ["request", "%{URIPROTO:http_protocol}"]
}
if [request] != "-" {
grok {
match => ["request", "(?[^?]*)"]
overwrite => ["request"]
}
}
geoip {
source => "client_ip"
target => "geoip"
add_tag => ["geoip"]
}
useragent {
source => "userAgent"
}
date {
match => ["timestamp", "ISO8601"]
}
}
}

I went ahead and deleted the large syslog.1 and rebooted the node. It doesn't seem to continue with the indexing. How do I resume operation? I'd hate to restart over.

Also, if I have to restart over, how do I set to logs to grow only up to a certain size and delete the oldest Is there a way to continue from the existing index? Thanks for your help!

This blog post discusses how enrichment and mappings affect storage size for data not dissimilar to yours and might be useful.

By default Elasticsearch mappings are designed to give you a lot of flexibility, and this flexibility will take up space (although your ration seems quite excessive). If you go through your mappings and optimise them you can save a lot of space.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.