Help on: Can't get text on a START_OBJECT

Hello,

I know it is a widely discussed subject but I don't know understand how to solve it.
from what I understood, it is a conflict between the filebeat mapping and my template,
but I just can't rename the conflicting key.

the error message is:

2019-03-13T17:07:21.648Z	WARN	elasticsearch/client.go:523	Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xbf1a6c121c7f546a, ext:18203913320, loc:(*time.Location)(0x21bf500)}, Meta:common.MapStr(nil), Fields:common.MapStr{"fields":common.MapStr{"document_type":"doc"}, "instance":"unconfigured", "pid":31737, "error":false, "host":common.MapStr{"name":"s3-ssl-conn-0.localdomain"}, "service":"sfused", "cancelled":false, "end_time":1.552496827572683e+12, "offset":47055618, "duration_ms":0.001221, "tid":694, "trace_id":3639506163531909, "trace_type":"op", "beat":common.MapStr{"name":"s3-ssl-conn-0.localdomain", "hostname":"s3-ssl-conn-0.localdomain", "version":"6.6.2"}, "parent_span_id":1102887333752238, "log":common.MapStr{"file":common.MapStr{"path":"/var/log/scality-traces/trace-sfused-2019-03-13_17h00-31737.log"}}, "layer":"cache", "start_time":1.552496827572682e+12, "source":"/var/log/scality-traces/trace-sfused-2019-03-13_17h00-31737.log", "op":"open", "span_id":1898903366422169}, Private:file.State{Id:"", Finished:false, Fileinfo:(*os.fileStat)(0xc4200dad00), Source:"/var/log/scality-traces/trace-sfused-2019-03-13_17h00-31737.log", Offset:47056019, Timestamp:time.Time{wall:0xbf1a6c1012446f71, ext:10032281361, loc:(*time.Location)(0x21bf500)}, TTL:-1, Type:"log", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x14c853c, Device:0xfd01}}}, Flags:0x1} (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse field [host] of type [keyword]","caused_by":{"type":"illegal_state_exception","reason":"Can't get text on a START_OBJECT at 1:63"}}

my mapping contains:

    "host": {
        "type": "keyword"
    },

and indeed, filebeat sends an object version of "host":
"host":common.MapStr{"name":"s3-ssl-conn-0.localdomain"},

What can I do to isolate my mapping from the filebeat generic mapping (if it is this kind of error)

here is the filebeat.yml

    filebeat.prospectors:
    - fields:
        document_type: doc
      json.keys_under_root: true
      input_type: log
      paths:
      - /var/log/app-traces/trace-*.log
    output.elasticsearch:
      hosts:
      - 10.200.3.221
      index: app-traces-%{+yyyy.MM.dd}
    setup.template.enabled: true
    setup.template.json.enabled: true
    setup.template.json.name: app-traces
    setup.template.json.path: /usr/share/app-tracer-tools/traces_mapping_template.json
    setup.template.overwrite: false
    setup.template.name: app-traces
    setup.template.pattern: app-traces*

PS: I use filebeat 6.6

Can you also check your Elasticsearch logs if there is some more details?

Where does the mapping for host comes from? Your template, or do you have an older beats template installed?

Beats create additional fields for host and beat. By using json.keys_under_root: true, you are actually running into a typing conflicts between Beats and your JSON file. Unfortunately Beats will actually replace your host field from the json document with the new object.
Using local or global processors you can move the host field or defer the json parsing. In Filebeat execution order is: local processors -> add host object -> global processors. e.g.:

filebeat.prospectors:
    - fields:
        document_type: doc
    json.keys_under_root: true
    input_type: log
    paths:
      - /var/log/app-traces/trace-*.log
    processors:
      - rename:
        fields: [{from: host, to: _host}]

processors:
- rename:
  fields: [{from: _host, to: host}]

or defer json parsing:

filebeat.prospectors:
    - fields:
        document_type: doc
      input_type: log
      paths:
      - /var/log/app-traces/trace-*.log

processors:
- decode_json_fields:
    fields: [message]
    target: ""
    overwrite_keys: true

...

Thank you for the help provided.
even if I still have the problem (it will make me crazy) :wink:
There is a layer of missing understanding on my side I guess.

The mapping comes from the template:

{
    "mappings": {
        "doc": {
            "properties": {
                "layer": {
                    "type": "keyword"
                }, 
                "ip_addr": {
                    "type": "ip"
                }, 
                "string": {
                    "type": "text"
                }, 
                "service": {
                    "type": "keyword"
                }, 
                "@timestamp": {
                    "type": "date"
                }, 
                "parent_span_id": {
                    "index": "false", 
                    "type": "long"
                }, 
                "trace_type": {
                    "type": "keyword"
                }, 
                "trace_id": {
                    "type": "long"
                }, 
                "label": {
                    "type": "keyword"
                }, 
                "ip_port": {
                    "type": "long"
                }, 
                "instance": {
                    "type": "keyword"
                }, 
                "host": {
                    "type": "keyword"
                }, 
                "num": {
                    "type": "keyword"
                }, 
                "end_time": {
                    "type": "double"
                }, 
                "key": {
                    "type": "keyword"
                }, 
                "error": {
                    "type": "boolean"
                }, 
                "cancelled": {
                    "type": "boolean"
                }, 
                "path": {
                    "type": "text"
                }, 
                "span_id": {
                    "index": "false", 
                    "type": "long"
                }, 
                "start_time": {
                    "type": "double"
                }, 
                "op": {
                    "type": "keyword"
                }
            }
        }
    }, 
    "template": "app-traces-*", 
    "settings": {
        "index.refresh_interval": "30s"
    }
}

I also have metricbeat running if it can help.

on the elasticsearch log side, the error are available at:
https://pastebin.com/7xmEbMLx

I tried your two workarounds with no success.
using prospectors, i have this error at startup:
each processor needs to have exactly one action, but found 2 actions
even with playing with indentations like it is written in other posts.

using decode_json_files, I have numerous errors like:
complete lines here: https://pastebin.com/WsRQGjRC

2019-03-14T17:20:45.643Z	ERROR	jsontransform/jsonhelper.go:53	JSON: Won't overwrite @timestamp because of parsing error: parsing time "2019-03-14T17:20:37.590326+0000" as "2006-01-02T15:04:05Z07:00": cannot parse "+0000" as "Z07:00"
2019-03-14T17:20:45.676Z	WARN	elasticsearch/client.go:523	Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xbf1ac13b3622c9b2, ext:20439332496, loc:(*time.Location)(0x21bf500)}, Meta:common.MapStr(nil), Fields:common.MapStr{"message":"{\"host\":\"s3-ssl-conn-0.localdomain\",\"service\":\"sfused\",\"instance\":\"unconfigured\",\"pid\":31737,\"trace_type\":\"op\",\"trace_id\":7334874514303313,\"span_id\":8606863545915609,\"parent_span_id\":5515993747783609,\"@timestamp\":\"2019-03-14T17:20:37.461066+0000\",\"
{...}
boolean]","caused_by":{"type":"json_parse_exception","reason":"Current token (START_OBJECT) not of boolean type\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@308bb76; line: 1, column: 322]"}}

to add more context,
I try to migrate the old setup from filebeat-5.4.
but I'm ready to rewrite every filters/mappings if a proper solution apply.

Additionnaly,
between each tests I clean the platform of course:

curl -s -XDELETE http://localhost:80/api/v0.1/es_proxy/_template/app-traces\*
curl -s -XDELETE http://localhost:80/api/v0.1/es_proxy/app-traces\*

an example of the data to parse from the log file:

{"host":"s3-ssl-conn-0.localdomain","service":"sfused","instance":"unconfigured","pid":31737,"trace_type":"ann_ipport","trace_id":1160454841680932,"span_id":3925821658678775,"parent_span_id":7611370955956023,"@timestamp":"2019-03-14T20:00:01.680800+0000","start_time":1552593601680.800,"label":"node","ip_addr":"10.200.1.76","ip_port":4251}
{"host":"s3-ssl-conn-0.localdomain","service":"sfused","instance":"unconfigured","pid":31737,"trace_type":"op","trace_id":1160454841680932,"span_id":7611370955956023,"parent_span_id":5895125288277863,"@timestamp":"2019-03-14T20:00:01.680786+0000","start_time":1552593601680.786,"end_time":1552593601688.502,"duration_ms":7.716064,"op":"conn_open","layer":"net","error":false,"cancelled":false,"tid":31929}
{"host":"s3-ssl-conn-0.localdomain","service":"sfused","instance":"unconfigured","pid":31737,"trace_type":"size_op","trace_id":1160454841680932,"span_id":8974535092911328,"parent_span_id":5895125288277863,"@timestamp":"2019-03-14T20:00:01.688526+0000","start_time":1552593601688.526,"end_time":1552593601697.217,"duration_ms":8.691162,"op":"read_http_reply","layer":"net","error":false,"cancelled":false,"size":324,"tid":31929}

Updates indentation:

filebeat.prospectors:
- input_type: log
  paths:
    - /var/log/app-traces/trace-*.log
  json.keys_under_root: true
  fields:
    document_type: doc
  processors:
    - rename:
        fields:
	  - from: host
	    to: _host

processors:
- rename:
    fields:
      - from: _host
        to: host

to add more context,
I try to migrate the old setup from filebeat-5.4.
but I'm ready to rewrite every filters/mappings if a proper solution apply.

Beats is moving towards Elastic Common Schems (ECS). The host.name field was introduced with Beats 6.4 I think, for ECS compatibility. Having a common schema will allow other products in and around the stack to provide additional functionality.
As you are in the process of changing things right now, I'd recommend to have a look at it as well.

The mapping comes from the template:

Right, this is your template, but the host field definition in it is not compatible with Beats its own event schema. Normally Beats uses fields.yml files for generating the template dynamically (with slight nuances per Elasticsearch version used), and the index patterns in Kibana.
Your mappings misses the beats namespace for example. In this case Elasticsearch will try to automatically derive a schema for missing fields.

Some more details here: Configure Elasticsearch index template loading | Filebeat Reference [8.11] | Elastic

If you comment out your setup.template settings, then you can run filebeat export template, so to print the default template filebeat would have installed.

using decode_json_files, I have numerous errors like:
2019-03-14T17:20:45.643Z ERROR jsontransform/jsonhelper.go:53 JSON: Won't overwrite @timestamp because of parsing error: parsing time "2019-03-14T17:20:37.590326+0000" as "2006-01-02T15:04:05Z07:00": cannot parse "+0000" as "Z07:00"

Oh, I see. The @timestamp in your JSON is not copmatible to the timestamp format in Beats :frowning:
Can you open a bug report for this on github?

an example of the data to parse from the log file:

Hm..., can host in your JSON document differ from the host filebeat runs on? If not then we don't need the initial rename or decode_json processors. Just migrate to host.name (remember, ECS), or use the global rename process to move host.name to host.

Thank you Steffens,

using your "rename" solution seems to do the job.
at least, I guess because now, I've another issue concerning the invalid @timestamp format and the 'error' field :wink:

anyway,
I opened a github issue for the @timestamp format,
and another discussion about the "error" field.

Hm..., can host in your JSON document differ from the host filebeat runs on? If not then we don't need the initial rename or decode_json processors. Just migrate to host.name (remember, ECS), or use the global rename process to move host.name to host .

I think it differs.
from what I see in metricbeat, beat.host is supposed to be the short-name, in my log file, host is the long name (more like beat.hostname).

I will look at the default filebeat template, 4556 lines, THAT is huge.

thank you for everything

Hey Steffens,

you pointed me to the correct direction, and the solution is so simple in fact.

In the json template, I have to replace:

    "host": {
        "type": "keyword"
    },

with:

                "host": {
                  "properties": {
                    "host": {
                      "ignore_above": 1024,
                      "type": "keyword"
                    }
                  }
                },

And the "Can’t get text on a START_OBJECT" error disappeared.
I still have the @timestamp issue to solve but it is another journey :wink:

Thank you so much!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.