Well the question may be already exists or there will be lots of solutions are available, but here the requirement is different.
Here I have a huge file around 44GB and it is increasing day by day.
Q1:- Can we load this much of huge file into elasticsearch?
Here is the sample data of the log file.
Jun 1 17: 12: 18 10.10 .125 .148 2017 - 06 - 01 T11: 42: 28 Z 352019 b8 - 0 d2d - 4397 - 446 a - 98 fabeddf3bf doppler[19]: {
"cf_app_id": "a4d311b3-f756-4d5e-bc3d-03690d461443",
"cf_app_name": "parkingapp",
"cf_ignored_app": false,
"cf_org_id": "c5803a97-d696-497e-a0a4-112117eefab1",
"cf_org_name": "KPIT",
"cf_origin": "firehose",
"cf_space_id": "886f2158-6b8a-4079-a6e1-7aa52034400d",
"cf_space_name": "Development",
"cpu_percentage": 0.022689683212221975,
"deployment": "cf",
"disk_bytes": 86257664,
"disk_bytes_quota": 1073741824,
"event_type": "ContainerMetric",
"instance_index": 0,
"ip": "10.10.125.113",
"job": "diego_cell",
"job_index": "356614b9-b079-4cc7-bcf9-4f61ab7924d0",
"level": "info",
"memory_bytes": 89395200,
"memory_bytes_quota": 536870912,
"msg": "",
"origin": "rep",
"time": "2017-06-01T11:42:28Z"
}
Jun 1 17: 12: 18 10.10 .125 .148 2017 - 06 - 01 T11: 42: 28 Z 352019 b8 - 0 d2d - 4397 - 446 a - 98 fabeddf3bf doppler[19]: {
"cf_app_id": "3a83fdf4-a69a-45ca-8537-f7916c79dbbb",
"cf_app_name": "spring-cloud-broker",
"cf_ignored_app": false,
"cf_org_id": "13233503-5430-4372-942c-02147ac34c38",
"cf_org_name": "system",
"cf_origin": "firehose",
"cf_space_id": "1f40ca9a-ca34-434b-aa17-82ed87657a6e",
"cf_space_name": "p-spring-cloud-services",
"cpu_percentage": 0.0955028907326772,
"deployment": "cf",
"disk_bytes": 188231680,
"disk_bytes_quota": 1073741824,
"event_type": "ContainerMetric",
"instance_index": 0,
"ip": "10.10.125.113",
"job": "diego_cell",
"job_index": "356614b9-b079-4cc7-bcf9-4f61ab7924d0",
"level": "info",
"memory_bytes": 641343488,
"memory_bytes_quota": 1073741824,
"msg": "",
"origin": "rep",
"time": "2017-06-01T11:42:28Z"
}
Jun 1 17: 12: 18 10.10 .125 .148 2017 - 06 - 01 T11: 42: 28 Z 352019 b8 - 0 d2d - 4397 - 446 a - 98 fabeddf3bf doppler[19]: {
"cf_app_id": "37acc229-844a-4ed3-ab54-5149ffab5b5b",
"cf_app_name": "apps-manager-js",
"cf_ignored_app": false,
"cf_org_id": "13233503-5430-4372-942c-02147ac34c38",
"cf_org_name": "system",
"cf_origin": "firehose",
"cf_space_id": "0ba61523-6a76-4d37-a0cd-a0117454a6eb",
"cf_space_name": "system",
"cpu_percentage": 0.04955433122879798,
"deployment": "cf",
"disk_bytes": 10235904,
"disk_bytes_quota": 107374182 4,
"event_type": "ContainerMetric",
"instance_index": 5,
"ip": "10.10.125.113",
"job": "diego_cell",
"job_index": "356614b9-b079-4cc7-bcf9-4f61ab7924d0",
"level": "info",
"memory_bytes": 6307840,
"memory_bytes_quota": 67108864,
"msg": "",
"origin": "rep",
"time": "2017-06-01T11:42:28Z"
}
You can see the logs are not completely JSON format.
Here Logs contains cf_app_name, with below command i got the only cf_app_name from logs and stored that output into another file.
grep -Po '"cf_app_name":.*?[^\\]"' /var/log/messages | cut -d ':' -f2 > applications.txt
then i have created indexes in elastisearch with the cf_app_name by reading that applications.txt file using below script
tr '[A-Z]' '[a-z]' < applications.txt > apps_name.txt
while IFS='"' read -ra arr;
do
for i in "${arr[@]}" ; do
name="$i"
CURL_COMMAND=`curl -XPUT 'localhost:9200/'$name'?pretty'`
echo $CURL_COMMAND
done
done </root/apps_name.txt
I did successfully created the thousands of indexes in Elastic Search.
Now what I would like to do is, load these logs into Elastic Search indexes according to the cf_app_name.
That means all the logs should store into respected indexes according to their cf_app_name.
Q2: Is logstash is the best solution for this ? If it is, Please provide your valuable suggestions to achieve this.
Thanks you, Bunny.