New to ES and Apache Logging: getting rid of ratty data


(Brian Dunbar) #1

I'm new to ES. We're test-piloting ELK 4 for webstats.

How do I get rid of ratty data?

Like this: I've got my 8 apache web servers sending access logs, I've got the log format looking good, GEOIP setup. I even have some basic stats on the Kibana Dashboard.

I've had this running for six days or so. I have even figured out that my servers were logging a ton of messages about their healthcheck, which was dirtying the stats, and have setup logstash to remove those.

But now I have a week, or so, of data that isn't right. I can explain, when I turn this over to the end user, that data from June 1 - June 6 is bad, pay no attention, but I'd prefer to work some magic and have ES take them away.

Is there an incantation for that? A doc I can read?


(Aaron Mildenstein) #2

Do you need the indices still? I mean, is there other data in there that you still need?

Yes: A delete_by_query to delete everything of type "apache" (presuming you used Logstash to type individual inputs).

No: Use Elasticsearch Curator to delete indices older than n days.


(Brian Dunbar) #3

It's tempting to discard 'all' and start over again - I'll think about both.


(system) #4