How to load json file into ES using logstash

Hi Guys,

can anyone provide me sample logstash conf file to parse and document the json formatted data into elasticsearch using logstash. There are multiple fields which needs to parsed. Any quick help is appreciated so that I can start this. Below is the sample JSON file which needs to be parsed.

=====================================================================
{
"0": {
"TEXT_CLEAN": "to add an attachment to a thread do the follow 1 once you have create a thread you will see add attachment option on the left hand side 2 click on browse select a file and click do 3 you can add files up to 5 mb of size 4 read through the privacy policy and submit how to add attachment to a thread",
"TEXT_CONTENT": "To add an Attachment to a thread, do the following:\n1.\tOnce you have created a thread, you will see 'Add Attachment' option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.",
"title": "How to add an Attachment to a thread",
"col1": {},
"KEYWORDS_CONTENT": [
"option",
"creat",
"read",
"attach",
"privaci",
"thread",
"side",
"follow",
"file",
"brows",
"left",
"hand",
"size",
"submit",
"add",
"click",
"polici"
],
"ner_hierarchy_display_content": {
"NERAction": [
"create",
"read"
],
"NERCategory": [
"",
"files"
]
},
"ner_hierarchy_display": {},
"KEYWORDS_CONTENT": "option create read attachment privacy thread side follow file 1 browse left hand size submit add click policy mb",
"html_content": "

To add an Attachment to a thread, do the following:

\n1.\tOnce you have created a thread, you will see 'Add Attachment' option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.",
"std_title": "how to add attachment to a thread"
},
"1": {} -- NEXT JSON CONTENT
}

Thanks,
Deepak

Hi there,

please, use the forma tool (image) in any of your next post, or your content will be unreadable.

Anyway, such a very simple pipeline might be a start for you:

input { 
  generator { 
    count => 1 
    lines => [ '{"0": {"TEXT_CLEAN": "to add an attachment to a thread do the follow 1 once you have create a thread you will see add attachment option on the left hand side 2 click on browse select a file and click do 3 you can add files up to 5 mb of size 4 read through the privacy policy and submit how to add attachment to a thread","TEXT_CONTENT": "To add an Attachment to a thread, do the following:\n1.\tOnce you have created a thread, you will see Add Attachment option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.","title": "How to add an Attachment to a thread","col1": {},"KEYWORDS_CONTENT": ["option","creat","read","attach","privaci","thread","side","follow","file","brows","left","hand","size","submit","add","click","polici"],"ner_hierarchy_display_content": {"NERAction": ["create","read"],"NERCategory": ["","files"]},"ner_hierarchy_display": {},"KEYWORDS_CONTENT": "option create read attachment privacy thread side follow file 1 browse left hand size submit add click policy mb","html_content": "To add an Attachment to a thread, do the following: \n1.\tOnce you have created a thread, you will see Add Attachment option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.","std_title": "how to add attachment to a thread"}}' ] 
    codec => "json"
  }
}

filter {
  json {
    source => "message"
  }
}

output {
  stdout {}
}

Hi @Fabio-sama,

Thanks for responding. By the way it looks like a static conf file where "lines" has the static content. Could you please help me with the dynamic logstash conf file.

Thanks,
Deepak

It is indeed a static conf file since you were interested in how to parse that type of content. So I use the generator input plugin to simulate such a content in input and show you how to parse it using the json filter plugin.

You obviously have to change the input (where do you get that content from?) and obviously you need to redirect the output to elasticsearch and not to stdout.

My post was to give you an idea about how to handle such kind of information.

Hi @Fabio-sama

How to create a conf file for below type of content where content will present in 0,1,2,3,4 ans so on:

{
"0": {
"TEXT_CLEAN": "to add an attachment to a thread do the follow 1 once you have create a thread you will see add attachment option on the left hand side 2 click on browse select a file and click do 3 you can add files up to 5 mb of size 4 read through the privacy policy and submit how to add attachment to a thread",
"TEXT_CONTENT": "To add an Attachment to a thread, do the following:\n1.\tOnce you have created a thread, you will see 'Add Attachment' option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.",
"title": "How to add an Attachment to a thread",
"col1": {},
"KEYWORDS_CONTENT": [
"option",
"creat",
"read",
"attach",
"privaci",
"thread",
"side",
"follow",
"file",
"brows",
"left",
"hand",
"size",
"submit",
"add",
"click",
"polici"
],
"ner_hierarchy_display_content": {
"NERAction": [
"create",
"read"
],
"NERCategory": [
"",
"files"
]
},
"ner_hierarchy_display": {},
"KEYWORDS_CONTENT": "option create read attachment privacy thread side follow file 1 browse left hand size submit add click policy mb",
"html_content": "

To add an Attachment to a thread, do the following:

\n1.\tOnce you have created a thread, you will see 'Add Attachment' option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.",
"std_title": "how to add attachment to a thread"
},
"1": {
"TEXT_CLEAN": "to add an attachment to a case do the follow 1 once you have create a case you will see add attachment option on the left hand side 2 click on browse select a file and click do 3 you can add files up to 5 mb of size 4 read through the privacy policy and submit steps to add attachments to a case",
"TEXT_CONTENT": "To add an Attachment to a case, do the following:\n1.\tOnce you have created a Case, you will see 'Add Attachment' option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.",
"main_action_dep": "step",
"SOURCE": "KB",
"ner": {},
"ner_hierarchy": {},
"TEXT_CLEAN_STEM": [
"add",
"attach",
"case",
"follow",
"onc",
"creat",
"case",
"see",
"add",
"attach",
"option",
"left",
"hand",
"side",
"click",
"brows",
"select",
"file",
"click",
"add",
"file",
"size",
"read",
"privaci",
"polici",
"submit",
"step",
"add",
"attach",
"case"
],
"actual_title": "How to add an Attachment to a case",
"ner_display": {},
"KEYWORDS_CONTENT_STEM": [
"option",
"creat",
"read",
"attach",
"privaci",
"case",
"side",
"follow",
"file",
"step",
"brows",
"left",
"hand",
"size",
"submit",
"add",
"click",
"polici"
],
"dep_tag": [
"add attachment"
],
"pobj_ner": ,
"url": "

To add an Attachment to a case, do the following:

\n1.\tOnce you have created a Case, you will see 'Add Attachment' option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.",
"keywords": "case attachment",
"dep_tag_main": [
"",
"add attachment",
"",
""
],
"pobj": [
"case"
],
"ner_hierarchy_display_content": {
"NERAction": [
"create",
"read"
],
"NERCategory": [
"",
"files"
]
},
"ner_hierarchy_display": {},
"KEYWORDS_CONTENT": "option create read attachment privacy case side follow file step 1 browse left hand size submit add click policy mb",
"html_content": "

To add an Attachment to a case, do the following:

\n1.\tOnce you have created a Case, you will see 'Add Attachment' option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.",
"std_title": "steps to add attachments to a case"
}

}

Thanks a lot in advance!!

please, use the format tool (image) in any of your next post, or your content will be unreadable.

@Fabio-sama Done that. Could you please help me. Though I have indexed this file into ES but except GET method none of the method is working like POST with term or match as it does not return anything. I used below Logstash conf. Could you please let me know where code is going wrong.

 input {
file {
codec => "json"
path => "/export/home/cassini/logstash-6.2.4/scripts/sample.json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}

filter {
json {
source => "message"
}
}

output {
   elasticsearch {
      hosts => "localhost:9200"
      index => "test_json_index"
       }
        stdout {}
}

Thanks,
Deepak

@Fabio-sama Done that

You kidding me? Does this look properly formatted and indented to you?

Not to talk about this one

Edit your content so it gets readable and easily reusable to replicate your problem.
Copy the content in any editor (Atom, Code, Sublime...), correctly format it, paste it back here, highlight it and press on the Preformatted text tool.
If you don't want to waste your time doing that, you cannot expect others waste theirs trying to help you.

Also, you didn't really post your problem.

I have indexed this file into ES

What does it look like in ES?

none of the method is working like POST with term or match as it does not return anything

Which POST? Which was the request? What did it return exactly?

Hi @Fabio-sama,

First of all sorry that I did not provide the proper details and pasted content of json file without formatting it so extremely for that. but now i have tried and formatted the same. below is my json file looks like where content will be present in "0" , "1" tag and so on which I have to read field wise like csv file where we can define fields in logstash conf and it reads data for those fields and indexed them in ES, Same thing i am trying to achieve with json file. Below is content from my json file

{
  "0": {
    "text_clean": "to add an attachment to a thread do the follow 1 once you have create a thread you will see add attachment option on the left hand side 2 click on browse select a file and click do 3 you can add files up to 5 mb of size 4 read through the privacy policy and submit how to add attachment to a thread",
    "title": "How to add an Attachment to a thread",
    "ner": {},
    "text_clean_stem": [
      "option",
      "creat"
    ],
    "ner_hierarchy_display_content": {
      "neraction": [
        "create",
        "read"
      ],
      "nercategory": [
        "",
        "files"
      ]
    },
    "ner_hierarchy_display": {},
    "html_content": "<p>To add an Attachment to a thread, do the following:</p>\n1.\tOnce you have created a thread, you will see 'Add Attachment' option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.",
    "std_title": "how to add attachment to a thread"
  },
  "1": {
    "text_clean": "to add an attachment to a query do the follow 1 once you have create a query you will see add attachment option on the left hand side 2 click on browse select a file and click do 3 you can add files up to 5 mb of size 4 read through the privacy policy and submit steps to add attachments to a query",
    "title": "How to add an Attachment to a thread",
    "ner": {},
    "text_clean_stem": [
      "add",
      "attach"
    ],
    "ner_hierarchy_display_content": {
      "neraction": [
        "create",
        "read"
      ],
      "nercategory": [
        "",
        "files"
      ]
    },
    "ner_hierarchy_display": {},
    "html_content": "<p>To add an Attachment to a query, do the following:</p>\n1.\tOnce you have created a query, you will see 'Add Attachment' option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.",
    "std_title": "steps to add attachments to a query"
  }
}

My conf file structure is as below. Please do the needful now.

input {
file {
type => "json"
path => "/path/sample.json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}

filter {
        json {
        source => "message"
        }
        if [message] =~ /^text_clean/ {
        drop {}
      }
}

output {
   elasticsearch {
      hosts => "localhost:9200"
      index => "test_json_index"
       }
        stdout{
                codec=>rubydebug
        }
}

Thank you so much!!

Ok so, thank you for following my directions and formatting your input json. Now it's much easier to read.

Now, what you would like to achieve - if I got it right - is to split your events and from a source input like

{
  "0": {
    "foo":"bar0"
  },
  "1": {
    "foo":"bar1"
  },
  "2": {
    "foo":"bar2"
  }
}

obtain 3 separate Elastic documents like:

{
  "foo":"bar0"
}

{
  "foo":"bar1"
}

{
  "foo":"bar2"
}

rather than have one single document similar to the input event. Is that right?

If so, in order to avoid a heavy ruby filter, would it be possible for you to edit the sample.json file in input and make it something like this?

{
  "events": [
    {
      "foo": "bar0"
    },
    {
      "foo": "bar1"
    },
    {
      "foo": "bar2"
    }
  ]
}

which in your case would mean something like:

{
  "events": [
    {
      "text_clean": "to add an attachment to a thread do the follow 1 once you have create a thread you will see add attachment option on the left hand side 2 click on browse select a file and click do 3 you can add files up to 5 mb of size 4 read through the privacy policy and submit how to add attachment to a thread",
      "title": "How to add an Attachment to a thread",
      "ner": {},
      "text_clean_stem": [
        "option",
        "creat"
      ],
      "ner_hierarchy_display_content": {
        "neraction": [
          "create",
          "read"
        ],
        "nercategory": [
          "",
          "files"
        ]
      },
      "ner_hierarchy_display": {},
      "html_content": "<p>To add an Attachment to a thread, do the following:</p>\n1.\tOnce you have created a thread, you will see 'Add Attachment' option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.",
      "std_title": "how to add attachment to a thread"
    },
    {
      "text_clean": "to add an attachment to a query do the follow 1 once you have create a query you will see add attachment option on the left hand side 2 click on browse select a file and click do 3 you can add files up to 5 mb of size 4 read through the privacy policy and submit steps to add attachments to a query",
      "title": "How to add an Attachment to a thread",
      "ner": {},
      "text_clean_stem": [
        "add",
        "attach"
      ],
      "ner_hierarchy_display_content": {
        "neraction": [
          "create",
          "read"
        ],
        "nercategory": [
          "",
          "files"
        ]
      },
      "ner_hierarchy_display": {},
      "html_content": "<p>To add an Attachment to a query, do the following:</p>\n1.\tOnce you have created a query, you will see 'Add Attachment' option on the left hand side.\n2.\tClick on Browse, select a file and click done.\n3.\tYou can add files up to 5 MB of size.\n4.\tRead through the privacy policy and submit.",
      "std_title": "steps to add attachments to a query"
    }
  ]
}

Anyway, in case it wouldn't be possible for you to edit your source file, what you need to do in your pipeline is basically to create that field events using your current "0", "1", "2" etc.. fields.
Then, you can split that events field, generating one document for every item in the events array.
Finally, you can take out the fields nested in the events field of each document (setting them as root fields) and remove the events field.

Pay attention to the input filter. In order to correctly parse the whole json as a single line, it is fundamental it ends with a newline, like this (see the empty 47th line)


Otherwise you risk the multiline codec not to work properly and take as input an invalid json.

Anyway, you can accomplish what I described above with the following pipeline:

input {
  file {
    type => "json"
    path => "/path/sample.json"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline {
      pattern => "^{"
      negate => "true"
      what => "previous"
      auto_flush_interval => 1
    }
  }
}

filter {
  json {
    source => "message"
  }

  # Store values of fields "0", "1", "2" etc.. as items of an array, remove old values and save that array as a new field 'events'.
  ruby {
    code => "
      events = []
      event.to_hash.each { |k, v| events << v && event.remove(k) unless k.match(/\d+/).nil? }
      event.set('events', events)
    "
  }

  # Split the new 'events' field, creating as many documents as many items in the 'events' array.
  split {
    field => "events"
  }


  # Extract the fields nested in the 'events' array, making them root fields.
  ruby {
    code => "
      event.get('events').each { |k, v| event.set(k,v) }
    "
  }

  # Remove unnecessary fields
  mutate {
    remove_field => ["events"]
  }
}

output {
  elasticsearch {
    hosts => "localhost:9200"
    index => "test_json_index"
  }
  stdout{}
}

Obviously, in case you don't need the whole message field (which might be huge for a big json input file) you can simply add "message" to the array of the remove_field filter.