This works, but Id like it to be better? Improving Data fields to lookup faster in ES?

I have a 2 logstash instances. They both run the following code well.

filter {
	grok {
		match => [ "path", "%{GREEDYDATA}/%{GREEDYDATA:filename}\.txt"]
	}
	grok {
		match => {
			"message" => "%{DATA:sampleinfo}[:;]%{GREEDYDATA:backupinfo}"
		}
	}
	mutate {
		gsub => ["backupinfo", "[\n\r\t]", ""]
	}
}

The throughput seems to be decent. Between both small machines it gives me like 3mil documents every 5 minutes or so, maybe more. The issue is that my elasticsearch instance has started to become really slow as it has over 250g of documents now. So slow that the dual Master/Data, single node cluster will cause Kibana to timeout on requests.

Since it means it is time to make improvements on Elastic, I was thinking that it would be useful to make improvements on logstash too as logstash tells Elastic how to ingest the information.

Ideally, im trying to monitor throughput on across all my logstash devices, but the most important things to me are the abilities to lookup and process requests related to sampleinfo and backupinfo, which are just variable character strings.

It seems that I can still monitor this information from outside of kibana but it takes time, and I think that before adding MORE machines (which just pushes off this issue) that I should take a look actively at the data being ingested and see if i can get it working better,

i like indexing on the timestamp because then i can do a date histogram to monitor throughput, but then looking at top 10 sampleinfo count i think is useful, as well as top 10 backupinfo counts. I figured that there may be a better way to define these key value pairs to make things faster on the ES side of the house?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.