Looking for advices on how to improve an Elastic architecture

Good evening,

During an internship in computer science (so I start with Docker, K8S and the Elastic stack), I'm asked to set up a log retrieval environment using the Elastic stack under ECK.
For the moment, I'm training locally with only docker-compose, Filebeat, ES and Kibana (to see later if it will be relevant to use Logstash). My work environment is :

  • Windows 10 PRO with Docker-Desktop
  • WSL/Ubuntu 18.04/Terminator
  • Elastic Stack in version 7.7.0

I'm in the following situation:

  • I have two nginx containers (basically used with F5 or ctrl+F5 on the welcome page)
  • I have two Filebeat containers
  • I have two ES containers
  • I have one Kibana container

Here is the docker-compose.yml file:

version: '3.3'
services:

  nginx1:
    container_name: nginx_app1
    image: nginx
    volumes:
      - /c/nginx/serv_1/logs:/var/log/nginx
    ports:
      - 80:80
    networks:
      - elk_net

  nginx2:
    container_name: nginx_app2
    image: nginx
    volumes:
      - /c/nginx/serv_2/logs:/var/log/nginx
    ports:
      - 81:80
    networks:
      - elk_net

  filebeat1:
    build:
      context: filebeat1/
    container_name: filebeat1
    hostname: filebeat1
    volumes:
      - /c/nginx/serv_1/logs:/usr/share/filebeat/nginxlogs:ro
      - /var/run/docker.sock:/var/run/docker.sock
    links:
      - es01
    depends_on:
      - es01
    networks:
      - elk_net

  filebeat2:
    build:
      context: filebeat2/
    container_name: filebeat2
    hostname: filebeat2
    volumes:
      - /c/nginx/serv_2/logs:/usr/share/filebeat/nginxlogs:ro
      - /var/run/docker.sock:/var/run/docker.sock
    links:
      - es01
    depends_on:
      - es01
    networks:
      - elk_net

  es01:
    build:
      context: elasticsearch/es1/
    hostname: elasticsearch
    container_name: elasticsearch
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es02
      - cluster.initial_master_nodes=es01,es02
      - bootstrap.memory_lock=true
      - "xpack.security.enabled: false"
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - /c/es/es-data1:/usr/share/elasticsearch/data
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - elk_net

  es02:
    build:
      context: elasticsearch/es2/
    hostname: elasticsearch2
    container_name: elasticsearch2
    environment:
      - node.name=es02
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01
      - cluster.initial_master_nodes=es01,es02
      - bootstrap.memory_lock=true
      - "xpack.security.enabled: false"
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - /c/es/es-data2:/usr/share/elasticsearch/data
    ulimits:
      memlock:
        soft: -1
        hard: -1
    networks:
      - elk_net

  kibana:
    image: docker.elastic.co/kibana/kibana:7.7.0
    container_name: kibana
    environment:
      - "LOGGING_QUIET=true"
    links:
      - es01
    depends_on:
      - es01
    ports:
      - 5601:5601
    networks:
      - elk_net

networks:
  elk_net:
    driver: bridge

And here are my filebeat.yml and elasticsearch.yml files :

1st filebeat.yml file:

filebeat.config:
  modules:
    path: /usr/share/modules.d/*.yml
    reload.enabled: false

filebeat.modules:
  - module: nginx
    access:
      var.paths: ["/usr/share/filebeat/nginxlogs/access.log"]
    error:
      var.paths: ["/usr/share/filebeat/nginxlogs/error.log"]

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "filebeat1-%{[beat.version]}-%{+yyyy.MM.dd}"

setup.template.name: "filebeat1"
setup.template.pattern: "filebeat1-*"

setup.kibana:
  host: "http://localhost:5601"

2nd filebeat.yml file:

filebeat.config:
  modules:
    path: /usr/share/modules.d/*.yml
    reload.enabled: false

filebeat.modules:
  - module: nginx
    tags:
    access:
      var.paths: ["/usr/share/filebeat/nginxlogs/access.log"]
    error:
      var.paths: ["/usr/share/filebeat/nginxlogs/error.log"]

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "filebeat2-%{[beat.version]}-%{+yyyy.MM.dd}"

setup.template.name: "filebeat2"
setup.template.pattern: "filebeat2-*"

setup.kibana:
  host: "http://localhost:5601"

1st elasticsearch.yml file:

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 0.0.0.0
http.port: 9200
xpack.security.enabled: false
discovery.zen.minimum_master_nodes: 2

2nd elasticsearch.yml file:

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 0.0.0.0
xpack.security.enabled: false
discovery.zen.minimum_master_nodes: 2

kibana.yml file:

server.port: 5601
server.host: localhost

At first glance, everything seems to work as it is BUT...:

  • I can't create a different index per nginx server: all logs (access and error) from both nginx servers end up in the same ES index. I tried writing different index names in the 2 filebeat.yml files in the output.elasticsearch code snippet but it doesn't run. Worth, the only index I get in ES and Kibana is simply named filebeat-7.7.0 without the date in this name...

  • from one day to another, new logs accumulate in the same index (I would have liked a new index per day and per nginx server)

  • globally, I'm not at all sure that the archictecture of the stack I propose is correct: I opted for the filebeat configuration with modules: I couldn't get the whole thing to work with the input configuration or the autodiscover. Also, I'm not at all sure this is the right way to set up a 2-node ES cluster

Later, I'll have to:

  • add other Beats components (MetricBeats...)
  • maybe add Logstash if this component is useful
  • finally, put everything in ECK

Do you have any suggestions to improve all this, please? I'll probably have more questions later (There are a lot of parameters I don't understand yet.).

Thank you in advance for your help.

Hello everybody,

I managed to get two different indexes thanks to ILM settings. In each filebeat.yml file, I added :

setup.ilm.enabled: true
setup.ilm.rollover_alias: "fb1-%{[agent.version]}"

In the meantime, I wanted to add Logstash to the log recovery chain. Here is the logstash.yml file:

node.name: elasticsearch
path.config: /usr/share/logstash/pipeline/beats.conf

And this is the beats.conf file:

input {
  beats {
    port => 5044
    host => "0.0.0.0.0"
  }
}

output {
  elasticsearch {
    hosts => "elasticsearch:9200"
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYYY.MM.dd}"
  }
  stdout { codec => rubydebug }
}

The problem is that now all logs end up in the same index in Elasticsearch. I can't find an index per nginx server anymore.

  • What is missing from my current configuration with Logstash to be able to redo an index by nginx server?
  • How can I get a different index per minute or per hour for example?

Thanks to you.

Good morning,

I finally found a way to retrieve two indexes in ES then Kibana. To do this, I started to play with the conditionals in the output block which looks like this now (I added a tag setting in the two filebeatx.yml files) :

output {
  if "nginx1" in [tags] {
    elasticsearch {
      hosts => "elasticsearch:9200"
      index => "nginx1-%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYYY.MM.dd}"
    }
  }
  else {
    elasticsearch {
      hosts => "elasticsearch:9200"
      index => "nginx2-%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYYY.MM.dd}"
    }
  }
}

I'm going to continue to try to refine all this by taking an interest in the filter block in particular.

I'm always looking for advice on how to improve my work as it stands, if you have any ideas.

Have a nice day.

Hi @LomigFR!

One question: Why do you need to have a different index for each of the nginx? Usually all logs are stored in the same index (if there is no specific reason to do otherwise) and they are separated by using the proper filters in kibana.

In addition I would suggest trying to put autodiscover in the map. This is the way to go when it comes in dynamic workloads deployed in k8s (or containerised envs in general). Imagine that your setup will have to scale up, by adding 3 more nginx instances, how your monitoring solution will be adoptive to this kind of changes?

C.

Hi Chris Mark and first of all thank you for your answer.

Regarding your first remark: as explained, I start with the Elastic and Docker/K8S stack. I'm trying things, experimenting. I don't really have a requirement for the architecture to be developed at the moment. What I'm doing here will inevitably be transformed in the long run when it comes to going into production. I lack experience and therefore a global view of what the deployment of the Elastic stack can give in the end.
To be sure I understood your remark: if you imagine having 3 applications running on Nginx, 4 others using Apache and you also monitor metrics for example (with Metricbeat)... Do you mean that all these logs and metrics have to be retrieved in one and the same Elasticsearch index? And that in a second step, we work with Kibana to filter and exploit all this ?

==> Won't that make "something" messy and difficult to exploit precisely?

Concerning your second remark: I've already looked into Autodiscover some time ago, but I haven't managed to make the whole stack work this way... So, still with a view to taking over the Elastic stack, I temporarily went another way and what I managed to run, uses Filebeat modules.
From what I understood of the "autodiscover" operation and as you say, it's the preferred solution to switch to ECK. So I'm going to take another look at this feature. If, from the lines of code I provided earlier, you can offer me some help to get Autodiscover working, you're welcome.

Thanks again for your remarks.

Guillaume.

  1. Metricbeat will store data in metricbeat-* index and Filebeat in filebeat-* index accordingly. Then in Kibana you can use ECS fields like container fields https://www.elastic.co/guide/en/ecs/current/ecs-container.html in order to search and visualise data. From what I have seen this is the way to go in most cases.

  2. Here is a quick example from the docs:

filebeat.autodiscover:
  providers:
    - type: docker
      templates:
        - condition:
            contains:
              docker.container.image: redis
          config:
            - module: redis
              log:
                input:
                  type: container
                  paths:
                    - /var/lib/docker/containers/${data.docker.container.id}/*.log

Specs for running Metricbat and Filebeat along with ECK: https://github.com/elastic/cloud-on-k8s/tree/master/config/recipes/beats. Note that the internal configuration might be handy when running out of ECK envs too.

Thanks again for your answers. So I will go deeper into ECS and Autodiscover.
But could you please explain to me what the next line corresponds to?

In connection with my previous docker-compose, which path should I provide there? Is it /c/nginx/serv_2/logs?

Can I use Autodiscover to retrieve logs from a mount volume used as a persistence of the logs on the host machine? (I feel like I'm getting tangled up...)

Autodiscover will collect logs that are produced by containers. Container logs are stored in different places (on the host, which means where containers are running on) according to each system's specifics. On linux you will find them at /var/lib/docker/containers/<container_id>/<container_id>-json.log so this is what the line you mentioned represents. Here autodiscover will utilize the container metadata (data.docker.container.id) in order to form the path and find the container's logs' path.

C.

Hello Chris Mark and thanks again for your advices.

I am going to work on Autodiscover feature with Kubernetes now. So I will perhaps have to come back and look for help again, but in another topic.

Have a nice day,

Guillaume.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.