Data Import Fail: Can't merge a non object mapping

0x00 · August 23, 2019, 2:21pm

First, apologies if this is a simple question, kinda new to this still. Help is greatly appreciated.

I am attempting to import zeek/bro logs via the data visualizer and for the most part, most things seem to be going okay. However, I just ran into a problem trying to import conn.log throwing the error below.

I thought this type of input was okay to send, am I wrong in that?
What can I do to get the data to import?

SAMPLE DATA

{"_path":"conn","_system_name":"sensor","_write_ts":"2019-07-02T15:53:03.511364Z","ts":"2019-07-02T15:52:46.377889Z","uid":"CByC0qkmPzrL8w4Akj","id.orig_h":"12.34.56.78","id.orig_p":64069,"id.resp_h":"98.76.54.32","id.resp_p":135,"proto":"tcp","service":"dce_rpc","duration":12.133461,"orig_bytes":2355,"resp_bytes":395,"conn_state":"SF","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ShADadFf","orig_pkts":9,"orig_ip_bytes":2727,"resp_pkts":7,"resp_ip_bytes":687,"tunnel_parents":[],"orig_l2_addr":"00:11:22:33:44:55","resp_l2_addr":"aa:bb:cc:dd:ee:ff"}

ERROR MESSAGE

Error creating index
[mapper_parsing_exception] Failed to parse mapping [_doc]: Can't merge a non object mapping [id.orig_h] with an object mapping [id.orig_h]

More
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Failed to parse mapping [_doc]: Can't merge a non object mapping [id.orig_h] with an object mapping [id.orig_h]"}],"type":"mapper_parsing_exception","reason":"Failed to parse mapping [_doc]: Can't merge a non object mapping [id.orig_h] with an object mapping [id.orig_h]","caused_by":{"type":"illegal_argument_exception","reason":"Can't merge a non object mapping [id.orig_h] with an object mapping [id.orig_h]"}},"status":400}

0x00 · August 24, 2019, 5:55pm

An update on this issue is that I'm confused why this is happening cause it seems like it doesn't like the field names for some reason but I'm looking at a record imported right before I tried to do this and see that record in the system:

Can anyone help point me in the direction of what I'm missing here?

Why is one record going in fine with no errors but now this one is erroring on a field that the previous record also had?

elasticforme · August 24, 2019, 6:47pm

what is type of that field? looks like mismatch field type

0x00 · August 25, 2019, 1:39am

@elasticforme

The 12.34.56.78 is an obfuscated IP address.

id.orig_h means the IP Address of the orig, or origin of network traffic with _h being the designation for host. This field is present within each JSON entry where network communication was involved which is every line in the current file I'm working with.

In full circle, the id.resp_h is the entry for the host on the receiving end of the traffic that responded to the originating system.

Does this help clarify?

0x00 · August 25, 2019, 9:20pm

I'm still struggling with this problem so any additional help would be greatly appreciated.

I just re-tried using elasticsearch_loader to see if maybe it would give me either better/different error messages or some indication of what I needed to resolve. I received this:

JSON file stats:
lines: 160,000
lines that went to HELK stack: 183 (shown in index)

elasticsearch.helpers.errors.BulkIndexError: ('317 document(s) failed to index.', [{u'index': {u'status': 400, u'_type': u'data', u'_index': u'wthisgoingon', u'error': {u'reason': u'Could not dynamically add mapping for field [id.resp_h.name.vals]. Existing mapping for [id.resp_h] must be of type object but found [text].', u'type': u'mapper_parsing_exception'}

xeraa · August 25, 2019, 11:06pm

The error messages are the same — I'll use the last one:

You have an existing field id.resp_h with the datatype text, so something like "id.resp_h": "foo" in one of the previously inserted documents.
Now you are trying to add a document with a value for id.resp_h.name.vals.

That leads to the collision of the datatype in id.resp_h. It can either be text or object but not both at the same time. It's kind of similar to adding a string into a long field, which would also blow up (both in relational databases and Elasticsearch).

To fix it:

If you control the generated data: Either have a concrete value or a subdocument in a field, but don't mix them.
If you don't control the data: One way would be a dot expander, which you could run automatically on specific fields during the ingest process (sometimes this is also called dedot in other components).

0x00 · August 26, 2019, 12:55am

I believe that explanation is extremely helpful and will chase down that and report back.

Preemptively thank you a ton.

0x00 · August 26, 2019, 2:26am

@xeraa

I just want to make 100% certain that I understand. Based on the record below:

{"_path":"conn","_system_name":"sensor","_write_ts":"2019-07-02T15:53:11.012579Z","ts":"2019-07-02T15:52:40.303594Z","uid":"CXwfbc43Jk2aqvv2AbVg","id.orig_h":"12.34.56.78","id.orig_p":54431,"id.resp_h":"78.56.23.12","id.resp_p":443,"proto":"tcp","duration":0.019613,"orig_bytes":1,"resp_bytes":0,"conn_state":"OTH","local_orig":true,"local_resp":false,"missed_bytes":0,"history":"Aa","orig_pkts":1,"orig_ip_bytes":52,"resp_pkts":1,"resp_ip_bytes":52,"tunnel_parents":[],"resp_cc":"US","orig_l2_addr":"00:11:22:33:44:55","resp_l2_addr":"aa:bb:cc:dd:ee:ff","id.resp_h.name.src":"DNS_PTR","id.resp_h.name.vals":["subdomain.l.google.com","subdomain.1e100.net"]}

The record id.resp_h with the IP is a text type object which is the top level object as far as the problem data is concerned and then each of these are entries that would fall underneath id.resp_h:

"id.resp_h":"78.56.23.12" as a text record (top-level)
"id.resp_h.name.src":"DNS_PTR" as a text record (sub-level beneath id.resp_h)
"id.resp_h.name.vals":["subdomain.l.google.com","subdomain.1e100.net"] is an object since it's a set of text records under the vals section of the id.resp_h.name field/record.

And based on these results I would need to figure out how to addres .vals as a text-type entry vs. the object-type that it currently is?

xeraa · August 26, 2019, 2:51am

Not quite:

You need to see this slightly more hierarchical: id.resp_h.name.src actually means that id is of the type object, resp_h is object, name is object, and only src is then text. Trying to make resp_h also text at the same time isn't possible.
id.resp_h.name.vals with an array of strings is actually of the datatype text. There is no array datatype in Elasticsearch and a single value or an array have the same mapping. Subdocuments are the ones that are different though with object.

PS: Depending on your use case I'd store IPs as an ip field, which would allow querying IP ranges for example (which won't work with text). But that's another problem.

0x00 · August 26, 2019, 3:10am

Unfortunately I don't control the data output or the field names since these are from a BRO network sensor which output this as individual json lines into the syslog so less than ideal but it is unfortunately all I have to work with right now. =\

That certainly makes it difficult to get this data stored correctly.

It almost looks like I'll either have to rename the id.resp_h or the sub fields (more likely) to still keep the data.

So theoretically if I were to say... rename id.resp_h to id.resp_h.name.ip and put the value in there it would go in as text because vals and src are already putting in text values at the same document level as what I'd be doing?

The record (I removed extra fields for simplicity) would look something like this:

{"id.resp_h.name.ip":"78.56.23.12","id.resp_h.name.src":"DNS_PTR","id.resp_h.name.vals":["subdomain.l.google.com","subdomain.1e100.net"]}

edit:
or I could rename id.resp_h.name.src to id.resp_h_name_src (and the same for the vals item) which would then remove the improper classing?
end-edit

or I would have to drop the vals and src fields because there's no way to handle a record in the hierarchy above the one with the most sub-elements?

xeraa · August 26, 2019, 3:33am

Yes, id.resp_h.name.ip will work. Also id.resp_h_name_src would be ok. You don't have to drop any fields — you just need to fix the field names.

If you want to do it in Elasticsearch, you could use an ingest pipeline with a rename processor. Or you do it with whatever you are using to get the syslog into Elasticsearch (Logstash maybe?).

PS: I haven't looked at the details, but the Filebeat module for Zeek is doing something similar with a rename: https://github.com/elastic/beats/blob/b500d0d7f8bd74191f6ab7dfc095c0f646d4fff1/x-pack/filebeat/module/zeek/connection/config/connection.yml#L28-L29

0x00 · August 26, 2019, 3:56am

Thank you. This information and understanding was a huge help so far, I just successfully pulled in 160,000 of.... a lot more to go.

For now, because I'm in a pretty serious time crunch for analysis of data once it's in the system I am using the web interface (via Kibana) or the jsonpyes tool but very much want to understand and document how to do this using the other tools.

I'm so new to this it's still quite challenging and I am often finding more questions than I have answers to with the platform. (learning curve)

I've started putting piecemeal things together in a github repository to try and capture my findings to help others (and myself to remember longer term).

I think the filebeat module is a great way to go and I tried it early, it was close but it put a whole bunch of filebeat information in front of the actual log data which was a problem for now. I hope to find a way to work with it since I think that might be one of the better approaches.

0x00 · August 26, 2019, 4:56pm

@xeraa To your point about using ip, astute observation as these fixes did allow all the data to import but I'm unable to get the needed context around it because it didn't come in with the ability to use geo_point or other meta data to help with analysis.

Again thank you for the help with this. It was great.

Unfortunately for me, now it's back to the drawing board with trying to figure out how to import the data with the context, like you mentioned possibly filebeat so I'll try.

Currently dumping my findings, troubleshooting, and scripts to github here: https://github.com/torrycrass/HELKalator

xeraa · August 27, 2019, 12:51am

Great that this worked now

Depending on what you want to query you'll need to fine tune your mapping — datatypes you'll need to look into will probably be:

ip
geo_point
keyword (for aggregations)

While not rocket science it will just need some time to get up to speed with all those concepts and then apply them for your use case.

system · September 24, 2019, 12:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can't merge a non object mapping with an object mapping Elasticsearch	3	6497	March 22, 2021
Can't merge a non object mapping with an object mapping error in machine learning(beta) module Elasticsearch	1	864	July 6, 2017
Can’t merge a non object mapping with an object mapping error in machine learning(beta) module Elasticsearch	10	17933	August 18, 2017
Can't merge a non object mapping Beats	2	3648	October 17, 2017
Can't merge a non object mapping ... with an object mapping ...but mapping still do not exists Logstash	3	437	March 29, 2019

Data Import Fail: Can't merge a non object mapping

Related topics