Json file from filebeat to Logstash and then to elasticsearch

Hi

I am few in setting filbeat, wondering i can get some advise . I am trying to ingested inventory data which is produced following json fileformat .
{
"_meta": {
"hostvars": {
"host1": {
"foreman": {
"architecture_id": 1,
"architecture_name": "x86_64",
"capabilities": [
"build"
],
"certname": "host1",
"comment": "this is hostname1",
"created_at": "2017-03-08T15:27:11Z",
"disk": "10gb",
"domain_id": 5,
},
"foreman_facts": {
"boardmanufacturer": "Intel Corporation",
"boardproductname": "440BX Desktop Reference Platform",
"ipaddress": "1.1.1.1",
"ipaddress_eth0": "1.1.1.2",
"ipaddress_lo": "127.0.0.1",
},
"foreman_params": {}
},
"host2": {
"foreman": {
"architecture_id": 1,
"architecture_name": "x86_64",
"capabilities": [
"build"
],
"certname": "host2",
"comment": "this hostname2",
"created_at": "2017-03-08T15:27:11Z",
"disk": "20gb",
"domain_id": 5,
},
"foreman_facts": {
"boardmanufacturer": "Intel Corporation",
"boardproductname": "440BX Desktop Reference Platform",
"ipaddress": "2.1.1.1",
"ipaddress_eth0": "2.2.2.2",
"ipaddress_lo": "127.0.0.1",
},
"foreman_params": {}
},
"foreman_all": [
"host3",
"host4",
],
"foreman_environment: [
"computer1",
"computer2"
],

So only interested in hostvars and index the document based on the hostname and ignore foreman_all and foreman_environment fields . I want to send the json to Logstash where I want to further filter some of the json fields and rename some of the json fields and then send it to elastic search .

I did open the topic in logstash section and they suggested to use filebeat multi line option to send the json data to logstash .

I am using following filebeat option , however logstsh throw json error when i send the data from filebeat to logstash .
filebeat.prospectors:

  • paths:
    • /var/log/mylog.json
      json.keys_under_root: true
      json.add_error_key: true

Final format in Elastic Search
Elastic doc id 1

computer name : "host1"
"architecture_id": 1,
"architecture_name": "x86_64",
"capabilities": ["build"],
"Company hardware name": "host1",
"comment": "this is hostname1",
"created_at": "2017-03-08T15:27:11Z",
"disk": "10gb",
"domain_id": 5,
"foreman_facts": {
"boardmanufacturer": "Intel Corporation",
"boardproductname": "440BX Desktop Reference Platform",
"ipaddress": "1.1.1.1",
"ipaddress_eth0": "1.1.1.2",
"ipaddress_lo": "127.0.0.1",

Elastic doc id 2

"computer name"" : "host2"
"architecture_id": 1,
"architecture_name": "x86_64",
"capabilities": ["build"],
"certname": "host2",
"comment": "this hostname2",
"created_at": "2017-03-08T15:27:11Z",
"disk": "20gb",
"domain_id": 5,
"boardmanufacturer": "Intel Corporation",
"boardproductname": "440BX Desktop Reference Platform",
"ipaddress": "2.1.1.1",
"ipaddress_eth0": "2.2.2.2",
"ipaddress_lo": "127.0.0.1",

Wonder if I can get any advise on above please

Here is what i see in Elastic search even if I send the data directly from Filebeat .

image

If you use Filebeat json feature, it expects the json on one line. You should use multiline in filebeat to make one event out of the json and then you can use logstash to decode the json and drop fields as needed.

Please use code formatting with ticks when you paste code to make sure the indentations stays the same and makes it more readable.

Hi Ruflin,

Thanks for the reply , I am using the following multiline config and for testing purposes i am sending data directly to elasticsearch . However it looks like filbeat unable to match the message . Is there any other setting I need to ingest the json data directly from filebeat to elasticsearch

Multiline options

multiline.pattern: ^{

multiline.negate: false

multiline.match: before

image

Hi
Manage to get the data in ELK using the following code . However I have noticed that filebeat treating whole json file as one message . Wondering If I can break the message and only send hostvars section and index the document based on the each hostname and ignore foreman_all and foreman_environment fields

some thing like below
Final format in Elastic Search
Elastic doc id 1

computer name : "host1"
"architecture_id": 1,
"architecture_name": "x86_64",
"capabilities": ["build"],
"Company hardware name": "host1",
"comment": "this is hostname1",
"created_at": "2017-03-08T15:27:11Z",
"disk": "10gb",
"domain_id": 5,
"boardmanufacturer": "Intel Corporation",
"boardproductname": "440BX Desktop Reference Platform",
"ipaddress": "1.1.1.1",
"ipaddress_eth0": "1.1.1.2",
"ipaddress_lo": "127.0.0.1",

Elastic doc id 2

"computer name"" : "host2"
"architecture_id": 1,
"architecture_name": "x86_64",
"capabilities": ["build"],
"certname": "host2",
"comment": "this hostname2",
"created_at": "2017-03-08T15:27:11Z",
"disk": "20gb",
"domain_id": 5,
"boardmanufacturer": "Intel Corporation",
"boardproductname": "440BX Desktop Reference Platform",
"ipaddress": "2.1.1.1",
"ipaddress_eth0": "2.2.2.2",
"ipaddress_lo": "127.0.0.1",

Following config worked :

multiline.pattern: '^{'

multiline.negate: true

multiline.match: after

Hi

Wondering if above is possible to break the multiline data at filebeat level.

Regards

Mussa shirazi

Glad you got it working with the pattern. To drop fields, you could use the drop_fields processor: https://www.elastic.co/guide/en/beats/filebeat/current/drop-fields.html

If the hostname field always exists, you could use it as part of the index pattern. (see format string): https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html

As far as I understand after your multiline it is all still in one event. So you would need to use in addition the decode_json processor: https://www.elastic.co/guide/en/beats/filebeat/current/decode-json-fields.html

Hi Ruflin

Thanks foe the last email , Unfortunately none of above work when i used the following setting . Still the message is ingested as one full message . which mean If I have 100 host name record all the data is sent as one message .

Regarding the question about hostname fields , it always exists in the message but the names changes as need to ingest about 1k different host name

used the fillowing config

processors:

  • decode_json_fields:
    fields: ["message"]
    process_array: false
    max_depth: 1
    overwrite_keys: false

processors:

  • drop_fields:
    when:
    condition
    fields: ["_meta","foreman_all", foreman_environment]

Still seen as one message in elastic search , I am using filebeat 5.6 , wondering if there are any improvements in version 6 fileabeat.

Could you paste the full config with ticks around it to make sure you have the correct indentation?

Also running with the debug log enabled could show you some more information.

Hi Ruflin,

I have used the below configs as per suggestion if I use multiline option as mentioned below then I see the Filebeat and logstash send the whole json file as one message . This is what I looking to break the message based on host name as mentioned above .

computer name : "host1"
"architecture_id": 1,
"architecture_name": "x86_64",
"capabilities": ["build"],
"Company hardware name": "host1",
"comment": "this is hostname1",
"created_at": "2017-03-08T15:27:11Z",
"disk": "10gb",
"domain_id": 5,
"foreman_facts": {
"boardmanufacturer": "Intel Corporation",
"boardproductname": "440BX Desktop Reference Platform",
"ipaddress": "1.1.1.1",
"ipaddress_eth0": "1.1.1.2",
"ipaddress_lo": "127.0.0.1",

#=========================== Filebeat Configuration =============================

filebeat.prospectors:


- type: log

  # Change to true to enable this prospector configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /opt/uploaddata/*.json
    #- c:\programdata\elasticsearch\logs\*


  ### JSON configuration

  document_type: json

  json.message_key: log


  json.keys_under_root: true

  json.overwrite_keys: true

  #json.add_error_key: false

multiline.pattern: '^{'

multiline.negate: true

multiline.match: after

output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5044"]


#=========================== Logstash  =============================
input {
 beats {
        port => "5044"
       }
}

filter {
json

{
      source => "parameter"
      target => "parameterData"
      remove_field => "parameter"
}

}
output {

elasticsearch {
        hosts => [ "10.138.7.51:9200" ]
index => "inventory-%{+YYYY-MM-dd}"
}
stdout {
codec => rubydebug
}
}

#=========================== Filbear Errors =============================

2017/11/24 16:45:14.226665 json.go:32: ERR Error decoding JSON: json: cannot unmarshal string into Go value of type map[string]interface {}
2017/11/24 16:45:14.226757 processor.go:262: DBG Publish event: {
  "@timestamp": "2017-11-24T16:45:14.226Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.0.0"
  },
  "json": {},
  "message": "            \"host4\",",
  "prospector": {
    "type": "log"
  },
  "beat": {
    "name": "filebeat",
    "hostname": "filebeat",
    "version": "6.0.0"
  },
  "source": "/opt/uploaddata/data.json",
  "offset": 1710
}
2017/11/24 16:45:14.226800 json.go:32: ERR Error decoding JSON: EOF
2017/11/24 16:45:14.226889 processor.go:262: DBG Publish event: {
  "@timestamp": "2017-11-24T16:45:14.226Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.0.0"
  },
  "json": {},
  "message": "",
  "source": "/opt/uploaddata/data.json",
  "offset": 1712,
  "prospector": {
    "type": "log"
  },
  "beat": {
    "name": "filebeat",
    "hostname": "filebeat",
    "version": "6.0.0"
  }


#=========================== Logstash Logs  =============================

{
    "@timestamp" => 2017-11-24T16:45:14.226Z,
        "offset" => 1638,
      "@version" => "1",
          "beat" => {
            "name" => "filebeat",
        "hostname" => "filebeat",
         "version" => "6.0.0"
    },
          "host" => "filebeat",
    "prospector" => {
        "type" => "log"
    },
          "json" => {},
        "source" => "/opt/uploaddata/data.json",
       "message" => "         },",
          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ]
}
{
    "@timestamp" => 2017-11-24T16:45:14.226Z,
        "offset" => 1666,
      "@version" => "1",
          "beat" => {
            "name" => "filebeat",
        "hostname" => "filebeat",
         "version" => "6.0.0"
    },
          "host" => "filebeat",
          "json" => {},
    "prospector" => {
        "type" => "log"
    },
        "source" => "/opt/uploaddata/data.json",
       "message" => "         \"foreman_all\":[  ",
          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ]
}

Hi Ruflin,

Wondering you had a chance to look at the logs please .

Regards

Mussa shirazi

In the config you shared about you use the json.* config options with multiline. Besides that the multiline indentation is wrong, you should use multiline with the decode_json processor and nothing the LS side or use multiline with the json filter in LS.

Most important part first is fixing indentation of your multiline. I would advice you to turn of the json parts at first and make sure you get one json "line" per event after fixing the multiline. And then you can do the next step to decode the json.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.