Load data into Elastic Search using Log-stash


(Bunny) #1

Well the question may be already exists or there will be lots of solutions are available, but here the requirement is different.

Here I have a huge file around 44GB and it is increasing day by day.

Q1:- Can we load this much of huge file into elasticsearch?

Here is the sample data of the log file.

    Jun 1 17: 12: 18 10.10 .125 .148 2017 - 06 - 01 T11: 42: 28 Z 352019 b8 - 0 d2d - 4397 - 446 a - 98 fabeddf3bf doppler[19]: {
    "cf_app_id": "a4d311b3-f756-4d5e-bc3d-03690d461443",
    "cf_app_name": "parkingapp",
    "cf_ignored_app": false,
    "cf_org_id": "c5803a97-d696-497e-a0a4-112117eefab1",
    "cf_org_name": "KPIT",
    "cf_origin": "firehose",
    "cf_space_id": "886f2158-6b8a-4079-a6e1-7aa52034400d",
    "cf_space_name": "Development",
    "cpu_percentage": 0.022689683212221975,
    "deployment": "cf",
    "disk_bytes": 86257664,
    "disk_bytes_quota": 1073741824,
    "event_type": "ContainerMetric",
    "instance_index": 0,
    "ip": "10.10.125.113",
    "job": "diego_cell",
    "job_index": "356614b9-b079-4cc7-bcf9-4f61ab7924d0",
    "level": "info",
    "memory_bytes": 89395200,
    "memory_bytes_quota": 536870912,
    "msg": "",
    "origin": "rep",
    "time": "2017-06-01T11:42:28Z"
}
Jun 1 17: 12: 18 10.10 .125 .148 2017 - 06 - 01 T11: 42: 28 Z 352019 b8 - 0 d2d - 4397 - 446 a - 98 fabeddf3bf doppler[19]: {
    "cf_app_id": "3a83fdf4-a69a-45ca-8537-f7916c79dbbb",
    "cf_app_name": "spring-cloud-broker",
    "cf_ignored_app": false,
    "cf_org_id": "13233503-5430-4372-942c-02147ac34c38",
    "cf_org_name": "system",
    "cf_origin": "firehose",
    "cf_space_id": "1f40ca9a-ca34-434b-aa17-82ed87657a6e",
    "cf_space_name": "p-spring-cloud-services",
    "cpu_percentage": 0.0955028907326772,
    "deployment": "cf",
    "disk_bytes": 188231680,
    "disk_bytes_quota": 1073741824,
    "event_type": "ContainerMetric",
    "instance_index": 0,
    "ip": "10.10.125.113",
    "job": "diego_cell",
    "job_index": "356614b9-b079-4cc7-bcf9-4f61ab7924d0",
    "level": "info",
    "memory_bytes": 641343488,
    "memory_bytes_quota": 1073741824,
    "msg": "",
    "origin": "rep",
    "time": "2017-06-01T11:42:28Z"
}
Jun 1 17: 12: 18 10.10 .125 .148 2017 - 06 - 01 T11: 42: 28 Z 352019 b8 - 0 d2d - 4397 - 446 a - 98 fabeddf3bf doppler[19]: {
    "cf_app_id": "37acc229-844a-4ed3-ab54-5149ffab5b5b",
    "cf_app_name": "apps-manager-js",
    "cf_ignored_app": false,
    "cf_org_id": "13233503-5430-4372-942c-02147ac34c38",
    "cf_org_name": "system",
    "cf_origin": "firehose",
    "cf_space_id": "0ba61523-6a76-4d37-a0cd-a0117454a6eb",
    "cf_space_name": "system",
    "cpu_percentage": 0.04955433122879798,
    "deployment": "cf",
    "disk_bytes": 10235904,
    "disk_bytes_quota": 107374182 4,
    "event_type": "ContainerMetric",
    "instance_index": 5,
    "ip": "10.10.125.113",
    "job": "diego_cell",
    "job_index": "356614b9-b079-4cc7-bcf9-4f61ab7924d0",
    "level": "info",
    "memory_bytes": 6307840,
    "memory_bytes_quota": 67108864,
    "msg": "",
    "origin": "rep",
    "time": "2017-06-01T11:42:28Z"
}

You can see the logs are not completely JSON format.

Here Logs contains cf_app_name, with below command i got the only cf_app_name from logs and stored that output into another file.

grep -Po '"cf_app_name":.*?[^\\]"' /var/log/messages | cut -d ':' -f2 > applications.txt
then i have created indexes in elastisearch with the cf_app_name by reading that applications.txt file using below script

tr '[A-Z]' '[a-z]' < applications.txt > apps_name.txt

while IFS='"' read -ra arr;

do
        for i in "${arr[@]}" ; do

        name="$i"

CURL_COMMAND=`curl -XPUT 'localhost:9200/'$name'?pretty'`

echo $CURL_COMMAND
done

done </root/apps_name.txt

I did successfully created the thousands of indexes in Elastic Search.

Now what I would like to do is, load these logs into Elastic Search indexes according to the cf_app_name.

That means all the logs should store into respected indexes according to their cf_app_name.

Q2: Is logstash is the best solution for this ? If it is, Please provide your valuable suggestions to achieve this.

Thanks you, Bunny.


(Magnus Bäck) #2

Can we load this much of huge file into elasticsearch?

Yes, of course.

I did successfully created the thousands of indexes in Elastic Search.

Why do you want to have separate indexes? Indexes have a fixed cost so having too many of them in relation to the size of your cluster typically isn't a good idea.

Is logstash is the best solution for this ? If it is, Please provide your valuable suggestions to achieve this.

I suggest you

  • use a multiline codec to join the lines of each logical event,
  • use a grok filter to extract fields for the timestamp(s) and whatever else you've got in addition to the JSON string at the end,
  • use a json filter to parse the JSON string,
  • reference the cf_app_name field in your elasticsearch output configuration (e.g. index => "%{cf_app_name}").

(Bunny) #3

Actually i am new to this logstash.
I tried the below script.
But i am getting errors.

input
{
    file
    {
        path => ["/root/test123.txt"]
        start_position => "beginning"
        sincedb_path => "/dev/null"
        exclude => "*.gz"
    }
}

filter
{
    grok {
        pattern => ["%{cf_app_name}"]
        named_captures_only => true
        }
    grep {
        match  => ["addition-server"]
        drop => false
        add_tag => json
        }
    json {
        tags => json
        message => data
        }
output
{
  elasticsearch {
    hosts => "localhost"
    index => "%{cf_app_name}"
}

    stdout { codec => rubydebug }
}

Error

fetched an invalid config {:config=>"input \n{\n    file \n    {\n        path => [\"/root/test123.txt\"]\n        start_position => \"beginning\"\n        sincedb_path => \"/dev/null\"\n        exclude => \"*.gz\"\n    }\n}\n\nfilter \n{\n    grok {\n\tpattern => [\"%{cf_app_name}\"]\n\tnamed_captures_only => true\n\t}\n    grep {\n\tmatch  => [\"addition-server\"]\n\tdrop => false\n\tadd_tag => json\n\t}\n    json {\n\ttags => json\n\tmessage => data\n\t}\t\noutput\n{ \n  elasticsearch {\n    hosts => \"10.10.236.61\"\n    index => \"%{cf_app_name}\"\n}\n\n    stdout { codec => rubydebug }\n}\n\n", :reason=>"Expected one of #, => at line 29, column 17 (byte 406) after filter \n{\n    grok {\n\tpattern => [\"%{cf_app_name}\"]\n\tnamed_captures_only => true\n\t}\n    grep {\n\tmatch  => [\"addition-server\"]\n\tdrop => false\n\tadd_tag => json\n\t}\n    json {\n\ttags => json\n\tmessage => data\n\t}\t\noutput\n{ \n  elasticsearch ", :level=>:error}

Can you please help me with this.

It ll be soo helpful for me.


(Magnus Bäck) #4

There are mulitple problems here:

  • You're not closing the filter block. There's a } missing. This is what's preventing Logstash from starting up.
  • You're referencing a cf_app_name field in your first grok filter but that filter won't exist until after the json filter. Filters are processed in order.
  • I don't understand what you're trying to do with the grok filter.
  • The grep filter is deprecated. What are you trying to do?

(Bunny) #5

I tried the below script but still no luck.

input
{
    file
    {
        path => ["/root/test123.txt"]
        start_position => "beginning"
        sincedb_path => "/dev/null"
        exclude => "*.gz"
    }
}

filter
{
        grok {

        match => {
                "cf_app_name" => "sparkle"
                 }
               }
}
output
{
  elasticsearch {
    hosts => "localhost"
    index => "sparkle"
}

    stdout { codec => rubydebug }
}

Can you please suggest


(Magnus Bäck) #6

I can't find anything obviously wrong with your configuration (except that the grok filter doesn't do anything useful) so you need to be more specific about the problems you're having.


(Bunny) #7

Actually i would like to store the logs into the index as
if cf_app_name is xxx
then it should store those xxx logs into the xxx index in elasticsearch.

Please suggest me.


(Bunny) #8

can you please help me with that grok filter.


(Magnus Bäck) #9

If you're not familiar with grok expressions then perhaps the grok constructor web site can help you get started.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.