I have line separated JSON files in the order of 1 TB, I can split the files as per the requirement, I just want to efficiently ingest all the data in my ELK stack. I am trying to use Logstash for this purpose, and this is my yaml configuration:
apiVersion: logstash.k8s.elastic.co/v1alpha1
kind: Logstash
metadata:
name: quickstart
spec:
count: 1
elasticsearchRefs:
- name: quickstart
clusterName: quickstart
version: 8.14.1
pipelines:
- pipeline.id: main
config.string: |
input {
file {
path => "path"
start_position => "beginning"
codec => "json"
sincedb_path => ".sincedb"
}
}
output {
elasticsearch {
hosts => [ "" ]
user => ""
password => ""
ssl_verification_mode => "none"
index => "logstash-%{+YYYY.MM.dd.hh}"
}
}
podTemplate:
spec:
containers:
- name: logstash
env:
- name: LS_JAVA_OPTS
value: "-Xms8g -Xmx8g"
volumeMounts:
- name: host-volume
mountPath: /path
readOnly: false
volumes:
- name: host-volume
hostPath:
path: /path
type: DirectoryOrCreate
I am not able to ingest file with more than 50-60 MB and the pod is crashing if larger file is tried.
Also is there any easy to use monitoring tool which can help me benchmark the ingestion rate?