Tuning ElasticSearch for handling logging

I'm relatively new to ElasticSearch and we have a small cluster running which is collecting all of our logging and metrics data. Currently it's suffering from pretty bad performance (particularly memory usage) and after doing some reading I suspect it has to do with all the fields which are getting indexed (which is basically everything). And some of our messages are fairly heterogenous (we are using SeriLog for .NET which captures fields in messages and makes them fields in the log output for example, and Elmah which generates a large number of fields.)

I'm trying to understand how to balance between what should be indexed and what should be searchable. We do define message templates which assign appropriate data types to all of the fields, but the sparseness of some of the fields is definitely suboptimal. For example, when there is a critical error, Elmah will log a full stack trace, server variable dump, etc - these only apply to those messages though. If I am an ops person who wants to look at error messages over time, I might be interested in stack traces and their contents (searching for keywords in the stack traces for example), but it doesn't seem like I want everything indexed in that blob - or maybe not indexed as that specific field. I'm not sure.

Can anyone who is using ES as a logging endpoint for heterogenous, sparse messages recommend some to-dos or specific actions I could take in designing my schema so that I am not blowing out ES ram requirements? For example, should i just log the Elmah block as one big string field and let ES do text analysis on it for searchability? Or should i put those Elmah messages in their own index, and cross-reference them with the master log only when I need to using a sort of join? What has been your experience using ES in this fashion?

What's your mapping look like now?

For our web server, we have something like below. For the metricbeat logs (which is all of our metrics), I am using basically the default metricbeat template, augmented with a few fields which are common to all of our log messages. The more I look at what we have in our messages, the more I am thinking I need to mark a ton of fields as not-indexed, and maybe have a custom pattern for ELMAH stack traces (to split on certain symbols, rather than all symbols.)

{
  "mappings": {
    "_default_": {
      "_all": {
        "norms": false
      },
      "_meta": {
        "version": "1.0.0"
      },
      "dynamic_templates": [
        {
          "strings_as_keyword": {
            "mapping": {
              "ignore_above": 1024,
              "type": "keyword"
            },
            "match_mapping_type": "string"
          }
        }
      ],
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "Environment": {
          "type": "keyword"
        },
        "Deployment": {
          "type": "date",
          "format": "yyyyMMddHHmmss"
        },
        "IPAddress": {
          "type": "ip"
        },
        "meta": {
          "properties": {
            "cloud": {
              "properties": {
                "availability_zone": {
                  "ignore_above": 1024,
                  "type": "keyword"
                },
                "instance_id": {
                  "ignore_above": 1024,
                  "type": "keyword"
                },
                "machine_type": {
                  "ignore_above": 1024,
                  "type": "keyword"
                },
                "project_id": {
                  "ignore_above": 1024,
                  "type": "keyword"
                },
                "provider": {
                  "ignore_above": 1024,
                  "type": "keyword"
                },
                "region": {
                  "ignore_above": 1024,
                  "type": "keyword"
                }
              }
            }
          }
        },
        "Application": {
          "type": "keyword"
        },
        "Level": {
          "type": "keyword"
        },
        "MessageTemplate": {
          "type": "text"
        },
        "Elmah": {
          "properties": {
            "ApplicationName": {
              "type": "keyword"
            },
            "Detail": {
              "type": "text"
            },
            "Exception": {
              "properties": {
                "ErrorCode": {
                  "type": "integer"
                },
                "HResult": {
                  "type": "integer"
                },
                "Message": {
                  "type": "text"
                },
                "Source": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword"
                    }
                  }
                },
                "StackTrace": {
                  "type": "text"
                },
                "TargetSite": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword"
                    }
                  }
                },
                "WebEventCode": {
                  "type": "integer"
                },
                "_typeTag": {
                  "type": "keyword"
                }
              }
            },
            "HostName": {
              "type": "keyword"
            },
            "Message": {
              "type": "text"
            },
            "Type": {
              "type": "keyword"
            },
            "Source": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword"
                }
              }
            },
            "StatusCode": {
              "type": "integer"
            }
          }
        }
      }
    }
  },
  "order": 0,
  "settings": {
    "index.mapping.total_fields.limit": 1000,
    "index.refresh_interval": "5s"
  },
  "template": "webserver-*"
}

Looks ok to me. Obviously it depends on the queries/aggs you are doing.

Are you using the Monitoring plugin?

Not yet. That'll come as soon as I have time to set it up :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.