Best Practices - Confused

Dear Logstash Team and Community,

I'm a self trained enthusiast of logstash and we've been working to move over applications into pushing to ES.

Some of our legacy applications need time to push directly to ES and thus i've created logstash configuration files to connect via JDBC. I have these working and obviously some of the items i've looked at is best practice.

Specifically, the DB has over 60 million events and I only need those within a time frame e.g. I'd only need 1/10th of that as an initial pull + incremental of 60k per day. Additionally, we only plan on having data around for maybe 30-60-90 days at best.

As part of the conf file I used data from the event to determine the timestamp vs the ingestion time (works great)

filter {
  mutate {
   convert => [ "datefield", "string" ]
}

date {
    match => [ "datefield", ISO8601 ]
    target => "@timestamp"
}

I then as part of the output specify my logstash to create an index by YYYY-MM-DD like below:
index => "myindex-%{+YYYY.MM.dd}"

A few observations:

  1. When i pull from 2017-01-01 this creates indexes for each day. Which is one of the best practices. However, I see a big spike in shard count (expected) which appears to have some consequences when loading (see below)

  2. I start loading the data, after maybe 20-30 minutes I start seeing 429 errors, etc. See point four below

  3. I am using the default logstash configurations e.g. no special settings for batch, workers, flush sizes etc.

  4. If i remove the creation of the index by day e.g. one big index seems fine. No errors.

My setup:

  • m3.medium AWS instance
  • Logstash 5.3.0 / ES 5.3.0
  • ES hosted (not AWS hosted) with 16Gb RAM -> 384GB SSD -> 2x Nodes for production
  • Loading in from January 1st equates to 6 million documents @ total size of 7gb. Details taken from X-Pack
  • As i understand by default 5 shards are created per index.
  • I checked my logstash template and it doesn't show any shard settings from : GET _template/logstash
  • I would typically be putting in 60-70k records per day for this index

My questions:

  • Should i use the index by YYYY.MM.DD per some of the best practice guidelines?
  • Where would i change the default 5 shards? Since I do not see in the logstash template?
  • Can i do this without interrupting those whom are already sending in "real-time"?
  • Should I be using a rollover index instead?
  • Would I just use curator on the rollover?

Thanks in advance

Yes.

I'm pretty sure it is in the template.

Yes, it'll just apply to the next index that gets created. For older ones, look at _shrink.

Maybe, it's definitely something you should look at.

For what exactly?

I presume, @Wayne_Taylor, that you were planning on using Curator's new rollover action? With you specifying that you would only have "60-70k records per day" for these indices, that's a very small number. I would suggest using a document count rollover policy, and allow those indices to get quite large.

Hi @warkolm, thank you for your responses. I verified the logstash template and didn't see anything. posted below. Seems a big template

I'll look to reduce by setting in my conf / template for now.

@theuntergeek, thanks yes I was. But the suggestion you've made makes more sense. Thanks for that.

{
  "logstash": {
    "order": 0,
    "version": 50001,
    "template": "logstash-*",
    "settings": {
      "index": {
        "refresh_interval": "5s"
      }
    },
    "mappings": {
      "_default_": {
        "dynamic_templates": [
          {
            "message_field": {
              "path_match": "message",
              "mapping": {
                "norms": false,
                "type": "text"
              },
              "match_mapping_type": "string"
            }
          },
          {
            "string_fields": {
              "mapping": {
                "norms": false,
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword"
                  }
                }
              },
              "match_mapping_type": "string",
              "match": "*"
            }
          }
        ],
        "_all": {
          "norms": false,
          "enabled": true
        },
        "properties": {
          "@timestamp": {
            "include_in_all": false,
            "type": "date"
          },
          "geoip": {
            "dynamic": true,
            "properties": {
              "ip": {
                "type": "ip"
              },
              "latitude": {
                "type": "half_float"
              },
              "location": {
                "type": "geo_point"
              },
              "longitude": {
                "type": "half_float"
              }
            }
          },
          "@version": {
            "include_in_all": false,
            "type": "keyword"
          }
        }
      }
    },
    "aliases": {}
  }
}

Wayne

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.