By default, when you send Logstash to Elasticsearch, the index is "logstash-%{+YYYY.MM.dd}".
The %{YYYY.MM.dd} from what I can tell, is coming from a document's @timestamp field.
My question: Is it possible to force Logstash to use "Today" as the date, as opposed to the @timestamp field?
Reason: I'm ingesting logs from a ton of devices I don't manage, and sometimes they come in with really out of whack timestamps. Then I get my ES creating new indices all over the place with 1-2 logs in each. Ends up with a ton of extra shards and memory use to manage those shards, feels dirty
(Yes, I know a "right" thing to do is to fix the sources....but that's just not as simple as it sounds )
afaik, if you have not modified @timestamp, by default @timestamp will be the timestamp logstash received your events.
however most people modified @timestamp field to match the @timestamp of the events as it’s logged by logsource rather than actual time the log is received by logstash. this is particularly helpful if you pull the log by scheduler rather than getting them in near real time, because then the actual timestamp will be retained
so the %{YYYY.MM.dd} is actually the day the log is received by logstash as per the docs
We're definitely updating @timestamp to match whatever is in the logs.
The Documentation is super vague on it's description of what the source of %{YYYY.MM.dd} is. I assumed the date it used would be "Today".
Until I had a device with a date set to January 1 1970 send a log, and Logstash happily created a new index called "logstash-1970-01-01". This is why I think the date it's using is coming from @timestamp.
The environment I'm logging has about 10,000 devices. if 1% of them have incorrect timestamps set, it's enough to create thousands of unnecessary shards in my cluster as a result. Our daily indices are created with 30 primary shards, so one log entry in a weird date ends up causing 30 extra shards.
I just noticed in the documentation that there is an option to add this field to a document:
[@metadata][target_index]
This might be a solution to my problem. I wonder though, what would happen if a document had this set, but logstash itself had a specific setting for index in it's output config?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.