Creating a set of yaml to create a basic ELK k8s system

In part, I think there will be some limitations with a 2015 Macbook Pro, so I plan to expand this to a cluster soon. The issue I was having though fundamentally is the creation and implementation of ELK on k8s. I was thinking there was a simple yaml which existed which, properly commented, explains how to stand up a simple system as a POC before expanding it. Therefore I am using Dockers implementation of Kubernetes, which I think is minikube or kubernetes-for-desktop or something.

I think my primary issue is related to linking things together. I have an elastic and kibana instance for example and they both seem to play together. This was established through some samples online. the issue i was starting to have though was moreso related to Logstash as it seems that it is a bit more complex. It seems to be unable to find the elasticsearch instances in which to submit information to.

The key issue i have been having is that logstash cant find the elasticsearch cluster. I think it might be related to maybe kubernetes internal networking or something?

This sample case wont need to be retains and can be stateless if it makes it easier to consume.

Elasticsearch:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: transit
spec:
  version: 7.9.2
  http:
    service:
      spec:
        type: LoadBalancer
  nodeSets:
  - name: default
    count: 1
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteMany
        resources:
          requests:
            storage: 20Gi
        storageClassName: standard
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: -Xms2g -Xmx2g
          resources:
            requests:
              memory: 4Gi
              cpu: 0.5
            limits:
              memory: 4Gi
              cpu: 2

Kibana:

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: transit
spec:
  version: 7.9.2
  count: 1
  elasticsearchRef:
    name: transit

But I cant seem to get elastic working correctly with this Logstash Info to follow.
logstash-configmap.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-configmap
data:
  logstash.yml: |
    http.host: "0.0.0.0"
    path.config: /usr/share/logstash/pipeline
  logstash.conf: |=
    input {
        http_poller {
                ...
            }
            schedule => {
                every => "2m"
            }
            codec => "json"
        }
    }

    filter {
        split {
            field => "[bustime-response][vehicle]"
            add_field => {
                "geo_coord" => "%{lat},%{lon}"
            }
        }
    }

    output {
        elasticsearch {
            index => "transit-pittsburgh"
            hosts => [ "${ES_HOSTS}" ]
            user => "${ES_USER}"
            password => "${ES_PASSWORD}"
            cacert => '/etc/logstash/certificates/ca.crt'
        }
    }

logstash deployment:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: logstash
  name: logstash
spec:
  containers:
  - image: docker.elastic.co/logstash/logstash:7.9.2
    name: logstash
    ports:
    - containerPort: 25826
    - containerPort: 5044
    env:
    - name: ES_HOSTS
      value: "https://transit-es-http:9200"
    - name: ES_USER
      value: "elastic"
    - name: ES_PASSWORD
      valueFrom:
        secretKeyRef:
          name: transit-es-elastic-user
          key: elastic
    resources: {}
    volumeMounts:
    - name: config-volume
      mountPath: /usr/share/logstash/config
    - name: logstash-pipeline-volume
      mountPath: /usr/share/logstash/pipeline
    - name: cert-ca
      mountPath: "/etc/logstash/certificates"
      readOnly: true
  restartPolicy: OnFailure
  volumes:
  - name: config-volume
    configMap:
      name: logstash-configmap
      items:
        - key: logstash.yml
          path: logstash.yml
  - name: logstash-pipeline-volume
    configMap:
      name: logstash-configmap
      items:
        - key: logstash.conf
          path: logstash.conf
  - name: cert-ca
    secret:
      secretName: transit-es-http-certs-public

status: {}

Logstash Service:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: logstash
  name: logstash
spec:
  ports:
  - name: "25826"
    port: 25826
    targetPort: 25826
  - name: "5044"
    port: 5044
    targetPort: 5044
  selector:
    app: logstash
status:
  loadBalancer: {}

What Am I missing with these files? I figured to get a minimum working set first and then from there I can expand the elastic instances etc.

Ideally i was hoping that I could do 2 things:
1- Be able to directly reference Elasticsearch from logstash like in Kibana, using its app name.
2- Instead of passing in a Logstash Conf file IN the Yaml, I wanted to be able to set a file instead, in order to allow me to swap the config files instead of adjusting nested text in yaml files.

ECK does not have support for Logstash; hence why you can't connect it to Elasticsearch with a single line like you would do with Kibana. You can refer to Access Elastic Stack services | Elastic Cloud on Kubernetes [1.2] | Elastic to figure out how to connect non-ECK applications to Elasticsearch (you seem to have done this already).

Have you considered using Filebeat to ship your logs? ECK supports Beats and could make the integration much easier. See Run Beats on ECK | Elastic Cloud on Kubernetes [2.10] | Elastic.

I knew Filebeat existed, but i didnt what the power behind it would be. I just thought Logstash had everything needed. Inputs ( Polling 20 files every 5 minutes), Filter ( Taking the below example and adding each entry in the array, as well as combining lat-lon props to a new geo-coordinate, and output ( destinations ).

If filebeat can do that, I have no problem shifting but i thought it was just dumping log data to elastic without any customization.

Here is a sample resulting Json data I am fetching:

{
	"bustime-response": {
		"vehicle": [
			{
				"vid": "6554",
				"rtpidatafeed": "Port Authority Bus",
				"tmstmp": "20201007 09:42",
				"lat": "40.49551010131836",
				"lon": "-80.25567626953125",
				"hdg": "180",
				"pid": 7154,
				"rt": "28X",
				"des": "Downtown-Oakland-Shadyside",
				"pdist": 6715,
				"dly": false,
				"spd": 15,
				"tatripid": "11179",
				"origtatripno": "11349117",
				"tablockid": "028X-009",
				"zone": "",
				"mode": 0,
				"psgld": "HALF_EMPTY"
			},
			{
				"vid": "6255",
				"rtpidatafeed": "Port Authority Bus",
				"tmstmp": "20201007 09:42",
				"lat": "40.440697763480394",
				"lon": "-80.06618529675053",
				"hdg": "6",
				"pid": 7154,
				"rt": "28X",
				"des": "Downtown-Oakland-Shadyside",
				"pdist": 94187,
				"dly": false,
				"spd": 38,
				"tatripid": "11178",
				"origtatripno": "11349116",
				"tablockid": "028X-006",
				"zone": "",
				"mode": 0,
				"psgld": "HALF_EMPTY"
			},
		]
	}
}

So i want to essentially have each Bus Entry as a document and for each one create a new property. Is that easily doable within filebeat?

When looking at MetricBeat, it has an http module akin to logstash which is good. The issue I was noticing was there is json and json.is_array. is_array could ALMOST do what we want, but it needs to unwrap the outer object. Doesnt look like you can query. LIkewise, my initial thoughts is that Metric Beat wont let me combine a lat and lon property in this case to be a new property called "geo" to that way it can more easily mapp out positions of buses.

RE: It looks like I know how to set up Logstash. The example I gave above (though you need to define a logstash file config) seems to fail for me. When looking at logstash logs or the output of stdout() that it fails to establish a connection to elasticsearch machine itself. That makes way for me to think that I failed somewhere along the line in establishing this correctly as it means the ouptut url is not correct OR there is something in elasticsearch i am not exposing correctly etc.

Filebeat is a lightweight data shipper and might not be able to perform some of the complex transforms that Logstash can do. Elasticsearch itself has the ingest pipelines feature that might be of interest to you.

Since this question is about Logstash and ECK, I tried to recreate the Logstash tutorial with ECK and ended up with the following set of manifests. Hopefully it will help you figure out the problem with your configuration.

---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
  labels:
    app.kubernetes.io/name: eck-logstash
    app.kubernets.io/component: elasticsearch
spec:
  version: 7.9.2
  nodeSets:
    - name: default
      count: 3
      config:
        node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
  labels:
    app.kubernetes.io/name: eck-logstash
    app.kubernets.io/component: kibana
spec:
  version: 7.9.2
  count: 1
  elasticsearchRef:
    name: elasticsearch
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-config
  labels:
    app.kubernetes.io/name: eck-logstash
    app.kubernets.io/component: logstash
data:
  logstash.yml: |
    http.host: "0.0.0.0"
    path.config: /usr/share/logstash/pipeline
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-pipeline
  labels:
    app.kubernetes.io/name: eck-logstash
    app.kubernets.io/component: logstash
data:
  logstash.conf: |
    input {
      beats {
        port => 5044
      }
    }
    filter {
      grok {
        match => { "message" => "%{COMBINEDAPACHELOG}"}
      }
      geoip {
        source => "clientip"
      }
    }
    output {
      elasticsearch {
        hosts => [ "${ES_HOSTS}" ]
        user => "${ES_USER}"
        password => "${ES_PASSWORD}"
        cacert => '/etc/logstash/certificates/ca.crt'
      }
    }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: logstash
  labels:
    app.kubernetes.io/name: eck-logstash
    app.kubernets.io/component: logstash
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: eck-logstash
      app.kubernets.io/component: logstash
  template:
    metadata:
      labels:
        app.kubernetes.io/name: eck-logstash
        app.kubernets.io/component: logstash
    spec:
      containers:
        - name: logstash
          image: docker.elastic.co/logstash/logstash:7.9.2
          ports:
            - name: "tcp-beats"
              containerPort: 5044
          env:
            - name: ES_HOSTS
              value: "https://elasticsearch-es-http.default.svc:9200"
            - name: ES_USER
              value: "elastic"
            - name: ES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: elasticsearch-es-elastic-user
                  key: elastic
          volumeMounts:
            - name: config-volume
              mountPath: /usr/share/logstash/config
            - name: pipeline-volume
              mountPath: /usr/share/logstash/pipeline
            - name: ca-certs
              mountPath: /etc/logstash/certificates
              readOnly: true
      volumes:
        - name: config-volume
          configMap:
            name: logstash-config
        - name: pipeline-volume
          configMap:
            name: logstash-pipeline
        - name: ca-certs
          secret:
            secretName: elasticsearch-es-http-certs-public
---
apiVersion: v1
kind: Service
metadata:
  name: logstash
  labels:
    app.kubernetes.io/name: eck-logstash
    app.kubernets.io/component: logstash
spec:
  ports:
    - name: "tcp-beats"
      port: 5044
      targetPort: 5044
  selector:
    app.kubernetes.io/name: eck-logstash
    app.kubernets.io/component: logstash
---
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: filebeat
  labels:
    app.kubernetes.io/name: eck-logstash
    app.kubernets.io/component: filebeat
spec:
  type: filebeat
  version: 7.9.2
  config:
    filebeat.inputs:
      - type: log
        paths:
          - /data/logstash-tutorial.log
    output.logstash:
      hosts: ["logstash.default.svc:5044"]
  deployment:
    podTemplate:
      metadata:
        labels:
          app.kubernetes.io/name: eck-logstash
          app.kubernets.io/component: filebeat
      spec:
        automountServiceAccountToken: true
        initContainers:
          - name: download-tutorial
            image: curlimages/curl
            command: ["/bin/sh"]
            args: ["-c", "curl -L https://download.elastic.co/demos/logstash/gettingstarted/logstash-tutorial.log.gz | gunzip -c > /data/logstash-tutorial.log"]
            volumeMounts:
              - name: data
                mountPath: /data
        containers:
          - name: filebeat
            securityContext:
              runAsUser: 0
            volumeMounts:
              - name: data
                mountPath: /data
        volumes:
          - name: data
            emptydir: {}

Thanks Ill take a look. This might resolve it.

Also per your elasticsearch has pipelines. It looks like if i could send each bus entry as a document from file beats, or really a simple custom script coupled with a cron job, then the pipeline could handle this more simple task with a pipe looking akin to:

processors : [{
  "if": "{{lat}} && {{lon}}"
  "set": {
    "field": "geo_coord"
    "value": "{{lat}},{{lon}}"
  }
}]

I wanted to let you know, that converting your code to yaml files or running as 1 file failed to run. There are multiple errors with regard to converting YAML to JSON.

Following up, it may be a Copy-Paste issue. Checking deeper into it.