Testing Elastic Stack and winlogbeat / query exceeds 1000 shards

Hello all, I'll begin by listing the components and versions I've installed first; all of the below are installed on one FreeBSD 11 p8 box:

Elasticsearch 5.0.2
Logstash 5.0.2
Kibana 5.0.2

Winlogbeat 5.2.2 is installed on a Windows 7 laptop.

I'm looking at using the Elastic Stack for managing logs at my place of work and I have followed the documentation for sending Windows event logs to Logstash and Elastic search. I have manually loaded the template to ES as per the instructions and have configured the Logstash conf file to accept beats. I then loaded the sample Kibana dashboards but when I entered the winlogbeat-* index pattern, Kibana complained with the following error:

Discover: Trying to query 3570 shards, which is over the limit of 1000. This limit exists because querying many shards at the same time can make the job of the coordinating node very CPU and/or memory intensive.

How did I get so many shards? Did I need to manually design the index/shard settings? Perhaps naively I thought loading the template would take care of all that. My conf file settings are below:

winlogbeat.yml

winlogbeat.event_logs:
  - name: Application
    ignore_older: 72h
  - name: Security
  - name: System

name: LAPTOP01

output.logstash:

  hosts: ["192.168.1.100:5044"]

logstash.conf

input {
    
        beats {
    port => 5044
  }

        file {
		type => "syslog"
		# path => [ "/var/log/*.log", "/var/log/messages", "/var/log/syslog" ]
		path => "/var/log/messages"
		start_position => "beginning"
  }
}

filter {
# An filter may change the regular expression used to match a record or a field,
# alter the value of parsed fields, add or remove fields, etc.
#
	if [type] == "syslog" {
		grok {
			match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} (%{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}|%{GREEDYDATA:syslog_message})" }
			add_field => [ "received_at", "%{@timestamp}" ]
			add_field => [ "received_from", "%{@source_host}" ]
		}
    
		if !("_grokparsefailure" in [tags]) {
			mutate {
				replace => [ "@source_host", "%{syslog_hostname}" ]
				replace => [ "@message", "%{syslog_message}" ]
			}
		}
		mutate {
			remove_field => [ "syslog_hostname", "syslog_message" ]
		}
		date {
			match => [ "syslog_timestamp","MMM  d HH:mm:ss", "MMM dd HH:mm:ss", "ISO8601" ] 
		}
		syslog_pri { }
	}
}

output {
  elasticsearch {
    hosts => "192.168.1.100:9200"
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

elasticsearch.yml

cluster.name: esearch-cluster
node.name: node-1
path.data: /zdata/elasticsearch-db
path.logs: /zdata/elasticsearch-log
path.scripts: /usr/local/libexec/elasticsearch
network.host: 192.168.1.100
http.port: 9200

Thanks for any help.

What's your cat shard output?

GET _cat/shards/winlog*

Hello, thanks for getting back to me; the shard output command shows the following line 7140 times:

winlogbeat-2015.01.31 4 p UNASSIGNED

How about when you do ?

GET _cat/indices/winlog*

I want to confirm you are indeed generating only daily indices and not hourly indices. From your config it looks like daily.

Hello, I'm at work at the moment and my test set up is at home so I'll update this when I'm back home.

Just to clarify, I ran into this problem just a few minutes after winlogbeat started sending data to the cluster.

No problem, while you are at it when you get home please run one more command.

GET winlog*/_settings

Will do, thanks for your efforts.

No problem ^^

Hello again, GET _cat/indices/winlog returns 715 entries similar to the following:

yellow open winlogbeat-2016.10.28 K-jUP-rxRq6CvARnU7hFgw 5 1 180 0 425.6kb 425.6kb

GET winlog/_settings* retuens over 14,000 lines but they're all very similar to the following:

{
  "winlogbeat-2016.10.28": {
    "settings": {
      "index": {
        "mapping": {
          "total_fields": {
            "limit": "10000"
          }
        },
        "refresh_interval": "5s",
        "number_of_shards": "5",
        "provided_name": "winlogbeat-2016.10.28",
        "creation_date": "1489020550246",
        "number_of_replicas": "1",
        "uuid": "K-jUP-rxRq6CvARnU7hFgw",
        "version": {
          "created": "5000299"
        }
      }
    }
  },

Thanks.

Are you using an alias when querying?

I'm very much a newbie so I wouldn't know how to do that :slight_smile:

As I understand it winlogbeat has created 715 indices with 5 shards each - that would explain the 3575 figure that Kibana is complaining about - was it supposed to create the data that way?

You are correct. With 715 indices I'm not surprised there are 3575 shards. My thinking pattern is, how are you querying the data that requires it to span over 1000 shards and then we need to work on cutting down the number of shards. Why is ignore not working or is it working but not enough.

Please provide your query.

I didn't write a particular query, I simply followed the instructions here: https://www.elastic.co/guide/en/beats/winlogbeat/current/winlogbeat-sample-dashboards.html

...to see what data winlogbeat was loading into Elasticsearch - I simply added winlogbeat-* in the Discover page in Kibana which then resulted in the error.

That is problem 1) , winlogbeat-* means all winlogbeat indices. Because you have 715 indices = 3575 shards. Please specify a particular winlogbeat.

For example:

winlogbeat-2016.10.28

Thanks, running GET _cat/indices/winlogbeat-2014.10.26 in the developer console does return a value, but what do I run in the Discover console and why do the instructions tell me to run winlogbeat-* ?

Running winlogbeat-* in normal circumstances is fine , < 1000 shards.

Run again with winlogbeat-2014* , winlogbeat-2015-*, and so on until you get some value.

I suppose what I don't understand is why do I have so many indices/shards? This is a test setup, I've only loaded a few hundred events from one Windows laptop.

Yeah, finding out why you have so many shards is the second part of our problem. I have an idea of why. You mention you only loaded a few hundred events, however, I see in your _cat/indices consisting of winlogbet-2014*. From 2016 to 2014 would be 2 years. It looks like it loaded 2 years of events.

Yeah its possible its going back a few years, I think it will ignore events older than 3 days just for the Windows Application log:

winlogbeat.event_logs:
  - name: Application
    ignore_older: 72h
  - name: Security
  - name: System

Still confused as to why its created so many shards from one Windows machine - what would've happened if I'd added hundreds! I wonder if its something to do with winlogbeats being a later version that the Elastic stack components I've installed?

Its possible I've got the output option in winlogbeat.yaml wrong - in my file I have:

output.logstash:
   hosts: ["192.168.1.100:5044"] 

But looking at https://www.elastic.co/guide/en/beats/winlogbeat/current/logstash-output.html it has the following example:

output.logstash:
  hosts: ["localhost:5044"]
  index: winlogbeat