Separate index per application or name space

Hello,

I have setup filebeat in kubernetes clusters, that is collecting logs from all containers with a specific annotation. The annotation its looking for is kubernetes.labels.apps

Elastic search cluster is running in AWS, which am using AWS elastic search resource with 3 nodes.

The cluster is working well now and the index name is filebeat-{{ .Values.elasticsearch.indexSuffix }}-%{+yyyy.MM.dd}

The relevant config looks like

output.elasticsearch:
  hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
  protocol: "https"
  headers: ["Content-Type: application/json"]
  index: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}-%{[data.kubernetes.labels.app]}-%{+yyyy.MM.dd}'
  ssl.verification_mode: 'none'
logging:
  level: info
setup:
  kibana:
    host: '${ELASTICSEARCH_HOST:elasticsearch}/_plugin/kibana/'
  template:
    name: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}'
    pattern: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}*'
    settings:
      index:
        number_of_shards: 3
        number_of_replicas: 3

I am thinking to have an index per application or per name space. Can anyone help me how to achieve it.

I tried to have an index per application by changing the above config to

    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      protocol: "https"
      headers: ["Content-Type: application/json"]
      index: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}-%{[data.kubernetes.labels.app]}-%{+yyyy.MM.dd}'
      ssl.verification_mode: 'none'
    logging:
      level: info
    setup:
      kibana:
        host: '${ELASTICSEARCH_HOST:elasticsearch}/_plugin/kibana/'
      template:
        name: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}-%{[data.kubernetes.labels.app]}'
        pattern: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}-%{[data.kubernetes.labels.app]}*'

But then nothing is getting published and from logs I can see below error

2019-08-07T14:20:39.627Z	ERROR	pipeline/output.go:100	Failed to connect to backoff(elasticsearch(https://k8s-logs-elk-qa.private:443)): Connection marked as failed because the onConnect callback failed: Error loading Elasticsearch template: error creating template instance: key not found

Then I tried with this config

    setup:
      kibana:
        host: '${ELASTICSEARCH_HOST:elasticsearch}/_plugin/kibana/'
      template:
        name: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}'
        pattern: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}*'
        settings:
          index:
            number_of_shards: 3
            number_of_replicas: 3

and getting error like

2019-08-07T10:54:50.971Z	DEBUG	[elasticsearch]	elasticsearch/client.go:731	HEAD https://k8s-logs-elk-qa.private:443/_template/filebeat-qa  <nil>
2019-08-07T10:54:50.975Z	INFO	template/load.go:129	Template already exists and will not be overwritten.
2019-08-07T10:54:50.975Z	INFO	pipeline/output.go:105	Connection to backoff(elasticsearch(https://k8s-logs-elk-qa.private:443)) established
2019-08-07T10:54:50.975Z	INFO	[publish]	pipeline/retry.go:189	retryer: send unwait-signal to consumer
2019-08-07T10:54:50.975Z	INFO	[publish]	pipeline/retry.go:191	  done
2019-08-07T10:54:50.982Z	DEBUG	[elasticsearch]	elasticsearch/client.go:321	PublishEvents: 50 events have been published to elasticsearch in 7.165086ms.
2019-08-07T10:54:50.982Z	DEBUG	[elasticsearch]	elasticsearch/client.go:526	Bulk item insert failed (i=0, status=500): {"type":"string_index_out_of_bounds_exception","reason":"String index out of range: 0"}
2019-08-07T10:54:50.982Z	DEBUG	[elasticsearch]	elasticsearch/client.go:526	Bulk item insert failed (i=1, status=500): {"type":"string_index_out_of_bounds_exception","reason":"String index out of range: 0"}
2019-08-07T10:54:50.982Z	DEBUG	[elasticsearch]	elasticsearch/client.go:526	Bulk item insert failed (i=2, status=500): {"type":"string_index_out_of_bounds_exception","reason":"String index out of range: 0"}

Can someone please help me ?

Thanks

Hello, Raju!
This separation is similar made (with example) and discussed here.
Or Elastic Doc example:

output.elasticsearch:
  hosts: ["http://localhost:9200"]
  indices:
    - index: "warning-%{[agent.version]}-%{+yyyy.MM.dd}"
      when.contains:
        message: "WARN"
    - index: "error-%{[agent.version]}-%{+yyyy.MM.dd}"
      when.contains:
        message: "ERR"

So, need conditions
Approximate my copy-paste:

 indices:
    - index: 'filebeat-{{ .Values.elasticsearch.indexSuffix}}-%{[data.kubernetes.labels.app]}-%{+yyyy.MM.dd}' 
      when:
        or:
          - equals:
              data.kubernetes.labels.app: 'cool bro'

Hi @raju.d thanks for joining the community...

One thing to consider is that if you make very granular indices (example index by app name by day) you may end up with many small indices and thus many small shards which may not provide the best performance / experience in the long run.

Depending on the rate / size of your log ingest, you may want to consider longer time bases (weeks / month) etc to get to a more optimal shard size / performance. (Perhaps 30GB-50GB shard as an example)

The other approach is to add the app name as a field and use a combined index then for queries, visualizations etc. you can filter by app name.

We often see folks start with an approach like this as it "makes sense" then end up with a poorly performing cluster after a couple months because of too many tiny indices and shards ... i.e. "Over Sharding"

Just a thought...

Thanks a lot for the suggestions.

Yes @maxozerov I have now setup the config as per your suggestions.

      indices:
        - index: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}-%{[data.kubernetes.labels.app]}-%{+yyyy.MM.dd}'
          when:
            and:
              - equals:
                  data.kubernetes.labels.app: app-service1
              - equals:
                  data.kubernetes.namespace: default
        - index: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}-%{[data.kubernetes.namespace]}-%{+yyyy.MM.dd}'
          when:
            or:
              - equals:
                  data.kubernetes.namespace: marketing
              - equals:
                  data.kubernetes.namespace: kube-system
              - equals:
                  data.kubernetes.namespace: default

Decided to not to go for separate indices per application as per @stephenb's suggestion. So as to avoid the cluster getting slow. Though, I don't believe am having too many applications. In total there are (will be) not more than 15 apps sending logs to elastic search.

One question I had is that, with the above configuration.. the app-service1 is running in default name space. So would the logs for this app be indexed in two places ? Considering it will match the condition of being in default name space ?

Is there a way to define an index output that will catch all those that doesn't match any specific condition ?

Thanks

@raju.d - In practice, I did not check, but I myself need to make separate indexes today :slight_smile: but, apparently - as for me - the condition play here: 1) 'when' 'and' (that is, two values must match ( app-service1& default)
So try on the test... and choose the configuration that suits you personally;)
I hope today I will try too - and if there is something to add - I will write)

Good luck @maxozerov I will wait to hear back about how it went for you.

I tried the above config, there are no errors in logs. But at the same time I don't see that new indexes are getting created.

I also changed the index pattern a bit to match the new index names. One thing I can see in logs is that

2019-08-08T14:12:24.586Z	INFO	template/load.go:129	Template already exists and will not be overwritten.

May be I need to force creating new template ?

Hello,

So what finally worked for me, is below config

      indices:
        - index: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}-%{[kubernetes.labels.app]}-%{+yyyy.MM.dd}'
          when:
            and:
              - equals:
                   kubernetes.labels.app: app-service1
              - equals:
                   kubernetes.namespace: default
        - index: 'filebeat-{{ .Values.elasticsearch.indexSuffix }}-%{[kubernetes.namespace]}-%{+yyyy.MM.dd}'
          when:
            or:
              - equals:
                   kubernetes.namespace: marketing
              - equals:
                   kubernetes.namespace: kube-system
              - equals:
                   kubernetes.namespace: default

Removed data. and it picked all values and created indices. Thanks!