Filebeat 7.9.1 parsing http_json data

Hi, I am having an issue parsing http_json data with Filebeat.

In the logs I can see the events come through fine.

2020-09-29T18:16:19.606+1000	DEBUG	[processors]	processing/processors.go:187	Publish event: {
  "@timestamp": "2020-09-29T08:16:19.599Z",
  "@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.9.1"
  },
  "message": {
"data": {
  "Event1019": {
    "Time": "2020-09-29 09:10:46",
    "Username": "<username>",
    "Action": "Failed Login Attempt",
    "Data": "",
    "IP_Address": "<ip_address>"
  },

But later on in the logs I get an error message:

> 2020-09-29T18:16:21.242+1000	WARN	[elasticsearch]	elasticsearch/client.go:407	Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0x23bd1e38, ext:63736964179, loc:(*time.Location)(nil)}, Meta:null, Fields:{"agen
t":{"ephemeral_id":"f6d2cdda-3295-49c6-ba0f-b96e7eadc9b7"

<Wall of text which is my json request>

status=OK}'","caused_by":{"type":"illegal_state_exception","reason":"Can't get text on a START_OBJECT at 1:99"}}

My json input looks something like

- type: httpjson
  url: redacted
  interval: 60m
  http_method: POST
  http_request_body:
             {
              "cid":redacted,
              "provhash":"redacted",
              "cmd":"reporting",
              "format": "siem"
             }
  fields_under_root: true
processors
  - decode_json_fields:
  fields: ["message"]
  overwrite_keys: true
  document_id: message

So I know the connection is working and I know it can see the data properly, but why cant it parse it into Elasticsearch correctly

Thanks

Hey @cdroberts, welcome to discuss :slightly_smiling_face:

The error seems to indicate that Elasticsearch is trying to parse something as a type it cannot be converted to. You should have more information in the error message, before the caused_by key, there should be an error starting with something like "failed to parse field". This should help to point to the field causing this problem.
These errors use to happen when you try to store some value in a field of an incompatible type. For example when trying to store a string in a numeric field, or a object in a keyword field.

Other probably unrelated thing. In the decode_json_fields I see you are using the document_id field. What are you using it for? It is used to use a field in the json object as id of the document, but in your case you are configuring it to message. Do you want to use a "message" field in the json document as id of the document?

Hi Jason,

Thanks for getting back to me.

So the other error is

Private:interface {}(nil), TimeSeries:false}, Flags:0x1, 
Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse field [message] of type [text] in document with id '<redacted>'. Preview of field's value: 
'{next=null, data={Event1315={Action=Password Changed, Username=<some_user>, Time=2020-09-29 06:20:46, 

The reason for using the document_id field.
Documenation says that

If configured, the field will be removed from the original json document and stored in `@metadata._id

So I was kind of hoping that this will break out everything in the [message] field. It was just something I was trying to fix the issue. I dont need what is in the [message] field after the json objects have been correctly parsed into Elastic

OK, made a couple of changes and still have the same error

Input
    - type: httpjson
  url: <redacted>
  interval: 60m
  http_method: POST
  tags: "<redacted>"
  http_request_body:
             {
              "cid":<redacted>,
              "provhash":"<redacted>",
              "cmd":"reporting",
              "format": "siem"
             }
  fields_under_root: true

Processors

  - decode_json_fields:
  fields: ["message", "data"]
  max_depth: 1

So I can see Filebeat is still parsing the logs correctly

"@timestamp": "2020-09-29T23:54:51.050Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "7.9.1"
  },
  "message": {
    "data": {
      "Event694": {
        "Action": "Log in",
        "Data": "<some_data",
        "IP_Address": "127.0.0.16",
        "Time": "2020-09-29 11:54:27",
        "Username": "<user1>"
      },
      "Event1226": {
        "Time": "2020-09-29 08:56:11",
        "Username": "<user2>",
        "Action": "Log in",
        "Data": "<some_data>",
        "IP_Address": "127.0.0.1"
      },
      "Event1356": {
        "Time": "2020-09-29 01:43:27",
        "Username": "<user3>",
        "Action": "Creating User",
        "Data": "<some_user>",
        "IP_Address": "127.0.0.1"
      },

Still getting a failed to parse field message

  "next":null,"status":"OK"},"tags":["[<tag>}"]}, Private:interface {}(nil), TimeSeries:false}, Flags:0x1, 
    Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse field [message] of type [text] in document with id <text>'. Preview of field's value: 
    '{next=null, data={Event1315={Action=Password Changed, Username=<user4>, Time=2020-09-29 06:20:46, 

So by the looks of it, the Data field from within the Json object can contain a lot of information

  "Event1233": {
    "Action": "Password Changed",
    "Data": "login.microsoftonline.com/3ff6cfa4-e715-48db-b8e1-0867b9f9fba3/oauth2/authorize?response_type=code&client_id=<client_id>&scope=openid%20profile%20email&nonce=N5f5ec2ad415fd&response_mode=form_post
&resource=https%3A%2F%2Fgraph.microsoft.com&state=8qRV1goSiUzIQa8&redirect_uri=https%3A%2F%2F<redirect_site>%2Fauth%2Foidc%2F&sso_reload=true",
    "IP_Address": "127.0.0.1",
    "Time": "2020-09-29 08:51:29",
    "Username": "<a_user>"

Is this too much for Filebeat to handle??, or is the problem that I have a data field outside json and a Data field inside??

Ok, I think the problem is that you are trying to store structured data in the message field, that is intended to store only text. Try to add a target to decode_json_fields, so parsed values are stored in a different namespace, something like this:

  - decode_json_fields:
      fields: ["message", "data"]
      max_depth: 1
      target: "reporting_data"

To have a clue on what fields can be good to use for each value, take a look to the Elastic Common Schema, there you can find a collection of fields that are reserved for specific uses. These fields have a pre-defined mapping in filebeat and shouldn't be used to store values of different types. There you can see for example that message is reserved to store log messages as text.

If you need it, you can provide additional mappings for your custom fields with the setting setup.template.append_fields. If you use this, you will need to run filebeat setup again.

Hi Jaime,

OK looks like we are close

After adding the 'target: "reporting_data" to the filebeat.yml file I get a new error.

Private:interface {}(nil), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): 
{"type":"illegal_argument_exception","reason":"Limit of total fields [10000] in index [filebeat-7.9.1-2020.09.30-000001] has been exceeded"}

So I am guessing that the parsing is now working, but it doesnt know how to put the data into the correct fields. There are only 5 fields that I need. Action, Data, IP_Address, Time and Username. So the big question is how do we just get these out of that "message"/"reporting_data" field.

As for the site.template change. I am not sure that this is something we need to do. If you think we do I will need some support for this

Hi @cdroberts,

Sorry, but I had overlooked something important. Each one of your documents contain information about multiple events, right?
The multiple events are sent in the JSON response in an hash, whose key is the identifier of the event. Elasticsearch store them under completely different keys, one for each field of each object, what leads to fields explosion.
If the objects were stored in an array, you could use the json_objects_array or split_events_by options of the httpjson input to send one document per event. But given that they are listed in a hash I don't think these options will work.

What application is generating these logs? Is this something that is in your hands to change so they are sent in a list instead of a hash?

Hi,
Each piece of text between the {} is one event that I am trying to separate into its own log, hopefully with fields Time, Action, Data, IP_Address and Username, instead of everything into a Message Field.

The application is not something I have any control of. This is how Filebeat downloads the data from the API.

I can use Powershell to download the data from the API and output it to a txt file that has one event per line

Event1    : @{Time=2020-10-01 23:50:19; Username=user1; IP_Address=127.0.0.1; Action=Creating User; Data=user4@domain.com.au}
Event2    : @{Time=2020-10-01 23:50:18; Username=user2; IP_Address=127.0.0.1; Action=Creating User; Data=user5@domain.com.au}
Event3    : @{Time=2020-10-01 23:50:18; Username=user3; IP_Address=127.0.0.1; Action=Creating User; Data=user6@domain.com.au}

But Filebeat is also having issues with this at the moment

Yep, I see now. But I am afraid this is not something supported by Filebeat.

Filebeat should be able to collect and parse a file like this one. Take a look to dissect as an option to parse these logs.

What problems is filebeat having with this file?

Btw, I have created an enhancement request to support maps in the httpjson input: https://github.com/elastic/beats/issues/21465