Logstash group time series by decade

I am ingesting academic articles and splitting the index by year:

output {
    elasticsearch {
        hosts => ["${ES_HOST_PROD}"]
        index => "works-%{year}"
        user => "${ES_USER_PROD}"
        password => "${ES_PASSWORD_PROD}"
        document_id => "%{my_id}"
    }
}

However, for the 1800s there are very few documents for each index. So I have 100 documents in the index works-1802, but I have 4 million in works-2020. What I would like to do is send any document in the 1800s into a single index labeled something like works-1800. Is there a way to do that in logstash?

You could change your output to reference something like [indexSuffix] and in the filter section do something like

if [year] {
    mutate { convert => { "year" => "integer" } }
    if [year] < 1900 {
        mutate { add_field => { "indexSuffix" => "1800" } }
    } else {
        mutate { add_field => { "indexSuffix" => "%{year}" } }
    }
} else {
    mutate { add_field => { "indexSuffix" => "Yikes" } }
}

Obviously you can add else blocks to merge decades of the early 1900s if desired.

Oh this is a great idea. Thank you!

Quick question - do you know if there is a way to remove indexSuffix after I am done with it, so it is not actually indexed?

Call it [@metadata][indexSuffix] instead (or use mutate+remove_field).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.