Data Import Fail: Can't merge a non object mapping

First, apologies if this is a simple question, kinda new to this still. Help is greatly appreciated.

I am attempting to import zeek/bro logs via the data visualizer and for the most part, most things seem to be going okay. However, I just ran into a problem trying to import conn.log throwing the error below.

I thought this type of input was okay to send, am I wrong in that?
What can I do to get the data to import?

SAMPLE DATA

{"_path":"conn","_system_name":"sensor","_write_ts":"2019-07-02T15:53:03.511364Z","ts":"2019-07-02T15:52:46.377889Z","uid":"CByC0qkmPzrL8w4Akj","id.orig_h":"12.34.56.78","id.orig_p":64069,"id.resp_h":"98.76.54.32","id.resp_p":135,"proto":"tcp","service":"dce_rpc","duration":12.133461,"orig_bytes":2355,"resp_bytes":395,"conn_state":"SF","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ShADadFf","orig_pkts":9,"orig_ip_bytes":2727,"resp_pkts":7,"resp_ip_bytes":687,"tunnel_parents":[],"orig_l2_addr":"00:11:22:33:44:55","resp_l2_addr":"aa:bb:cc:dd:ee:ff"}

ERROR MESSAGE

Error creating index
[mapper_parsing_exception] Failed to parse mapping [_doc]: Can't merge a non object mapping [id.orig_h] with an object mapping [id.orig_h]

More
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Failed to parse mapping [_doc]: Can't merge a non object mapping [id.orig_h] with an object mapping [id.orig_h]"}],"type":"mapper_parsing_exception","reason":"Failed to parse mapping [_doc]: Can't merge a non object mapping [id.orig_h] with an object mapping [id.orig_h]","caused_by":{"type":"illegal_argument_exception","reason":"Can't merge a non object mapping [id.orig_h] with an object mapping [id.orig_h]"}},"status":400}

An update on this issue is that I'm confused why this is happening cause it seems like it doesn't like the field names for some reason but I'm looking at a record imported right before I tried to do this and see that record in the system:

image

Can anyone help point me in the direction of what I'm missing here?

Why is one record going in fine with no errors but now this one is erroring on a field that the previous record also had?

what is type of that field? looks like mismatch field type

@elasticforme

The 12.34.56.78 is an obfuscated IP address.

id.orig_h means the IP Address of the orig, or origin of network traffic with _h being the designation for host. This field is present within each JSON entry where network communication was involved which is every line in the current file I'm working with.

In full circle, the id.resp_h is the entry for the host on the receiving end of the traffic that responded to the originating system.

Does this help clarify?

I'm still struggling with this problem so any additional help would be greatly appreciated.

I just re-tried using elasticsearch_loader to see if maybe it would give me either better/different error messages or some indication of what I needed to resolve. I received this:

JSON file stats:
lines: 160,000
lines that went to HELK stack: 183 (shown in index)

elasticsearch.helpers.errors.BulkIndexError: ('317 document(s) failed to index.', [{u'index': {u'status': 400, u'_type': u'data', u'_index': u'wthisgoingon', u'error': {u'reason': u'Could not dynamically add mapping for field [id.resp_h.name.vals]. Existing mapping for [id.resp_h] must be of type object but found [text].', u'type': u'mapper_parsing_exception'}

The error messages are the same β€” I'll use the last one:

  • You have an existing field id.resp_h with the datatype text, so something like "id.resp_h": "foo" in one of the previously inserted documents.
  • Now you are trying to add a document with a value for id.resp_h.name.vals.

That leads to the collision of the datatype in id.resp_h. It can either be text or object but not both at the same time. It's kind of similar to adding a string into a long field, which would also blow up (both in relational databases and Elasticsearch).

To fix it:

  • If you control the generated data: Either have a concrete value or a subdocument in a field, but don't mix them.
  • If you don't control the data: One way would be a dot expander, which you could run automatically on specific fields during the ingest process (sometimes this is also called dedot in other components).
3 Likes

I believe that explanation is extremely helpful and will chase down that and report back.

Preemptively thank you a ton. :smiling_face_with_three_hearts:

@xeraa

I just want to make 100% certain that I understand. Based on the record below:

{"_path":"conn","_system_name":"sensor","_write_ts":"2019-07-02T15:53:11.012579Z","ts":"2019-07-02T15:52:40.303594Z","uid":"CXwfbc43Jk2aqvv2AbVg","id.orig_h":"12.34.56.78","id.orig_p":54431,"id.resp_h":"78.56.23.12","id.resp_p":443,"proto":"tcp","duration":0.019613,"orig_bytes":1,"resp_bytes":0,"conn_state":"OTH","local_orig":true,"local_resp":false,"missed_bytes":0,"history":"Aa","orig_pkts":1,"orig_ip_bytes":52,"resp_pkts":1,"resp_ip_bytes":52,"tunnel_parents":[],"resp_cc":"US","orig_l2_addr":"00:11:22:33:44:55","resp_l2_addr":"aa:bb:cc:dd:ee:ff","id.resp_h.name.src":"DNS_PTR","id.resp_h.name.vals":["subdomain.l.google.com","subdomain.1e100.net"]}

The record id.resp_h with the IP is a text type object which is the top level object as far as the problem data is concerned and then each of these are entries that would fall underneath id.resp_h:

  • "id.resp_h":"78.56.23.12" as a text record (top-level)
  • "id.resp_h.name.src":"DNS_PTR" as a text record (sub-level beneath id.resp_h)
  • "id.resp_h.name.vals":["subdomain.l.google.com","subdomain.1e100.net"] is an object since it's a set of text records under the vals section of the id.resp_h.name field/record.

And based on these results I would need to figure out how to addres .vals as a text-type entry vs. the object-type that it currently is?

Not quite:

  1. You need to see this slightly more hierarchical: id.resp_h.name.src actually means that id is of the type object, resp_h is object, name is object, and only src is then text. Trying to make resp_h also text at the same time isn't possible.
  2. id.resp_h.name.vals with an array of strings is actually of the datatype text. There is no array datatype in Elasticsearch and a single value or an array have the same mapping. Subdocuments are the ones that are different though with object.

PS: Depending on your use case I'd store IPs as an ip field, which would allow querying IP ranges for example (which won't work with text). But that's another problem.

Unfortunately I don't control the data output or the field names since these are from a BRO network sensor which output this as individual json lines into the syslog so less than ideal but it is unfortunately all I have to work with right now. =\

That certainly makes it difficult to get this data stored correctly.

It almost looks like I'll either have to rename the id.resp_h or the sub fields (more likely) to still keep the data.

So theoretically if I were to say... rename id.resp_h to id.resp_h.name.ip and put the value in there it would go in as text because vals and src are already putting in text values at the same document level as what I'd be doing?

The record (I removed extra fields for simplicity) would look something like this:

{"id.resp_h.name.ip":"78.56.23.12","id.resp_h.name.src":"DNS_PTR","id.resp_h.name.vals":["subdomain.l.google.com","subdomain.1e100.net"]}

edit:
or I could rename id.resp_h.name.src to id.resp_h_name_src (and the same for the vals item) which would then remove the improper classing?
end-edit

or I would have to drop the vals and src fields because there's no way to handle a record in the hierarchy above the one with the most sub-elements?

Yes, id.resp_h.name.ip will work. Also id.resp_h_name_src would be ok. You don't have to drop any fields β€” you just need to fix the field names.

If you want to do it in Elasticsearch, you could use an ingest pipeline with a rename processor. Or you do it with whatever you are using to get the syslog into Elasticsearch (Logstash maybe?).

PS: I haven't looked at the details, but the Filebeat module for Zeek is doing something similar with a rename: https://github.com/elastic/beats/blob/b500d0d7f8bd74191f6ab7dfc095c0f646d4fff1/x-pack/filebeat/module/zeek/connection/config/connection.yml#L28-L29

Thank you. This information and understanding was a huge help so far, I just successfully pulled in 160,000 of.... a lot more to go.

For now, because I'm in a pretty serious time crunch for analysis of data once it's in the system I am using the web interface (via Kibana) or the jsonpyes tool but very much want to understand and document how to do this using the other tools.

I'm so new to this it's still quite challenging and I am often finding more questions than I have answers to with the platform. (learning curve)

I've started putting piecemeal things together in a github repository to try and capture my findings to help others (and myself to remember longer term).

I think the filebeat module is a great way to go and I tried it early, it was close but it put a whole bunch of filebeat information in front of the actual log data which was a problem for now. I hope to find a way to work with it since I think that might be one of the better approaches.

@xeraa To your point about using ip, astute observation as these fixes did allow all the data to import but I'm unable to get the needed context around it because it didn't come in with the ability to use geo_point or other meta data to help with analysis.

Again thank you for the help with this. It was great.

Unfortunately for me, now it's back to the drawing board with trying to figure out how to import the data with the context, like you mentioned possibly filebeat so I'll try. :thinking:

Currently dumping my findings, troubleshooting, and scripts to github here: https://github.com/torrycrass/HELKalator

Great that this worked now :slight_smile:

Depending on what you want to query you'll need to fine tune your mapping β€” datatypes you'll need to look into will probably be:

  • ip
  • geo_point
  • keyword (for aggregations)

While not rocket science it will just need some time to get up to speed with all those concepts and then apply them for your use case.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.