Transform Mapping Errors

bm11100 · August 31, 2022, 5:27pm

Hello,

I've set up a transform to aggregate on process.name, however it continues to fail exactly on the same date with the following error -

Failed to index documents into destination index due to permanent error: [org.elasticsearch.xpack.transform.transforms.BulkIndexingException: Bulk index experienced [1] failures and at least 1 irrecoverable [org.elasticsearch.xpack.transform.transforms.TransformException: Destination index mappings are incompatible with the transform configuration.; org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [process.name.terms] of type [flattened] in document with id 'AEd3S3VLUMATGYPzlKmOr_-PAAAAAAAA'. Preview of field's value: 'null'; java.lang.IllegalArgumentException: field name cannot be an empty string].; org.elasticsearch.xpack.transform.transforms.TransformException: Destination index mappings are incompatible with the transform configuration.; org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [process.name.terms] of type [flattened] in document with id 'AEd3S3VLUMATGYPzlKmOr_-PAAAAAAAA'. Preview of field's value: 'null'; java.lang.IllegalArgumentException: field name cannot be an empty string]

The destination mappings were created by the transform itself. Should I adjust the destination mappings? Should I change the query? The data is all coming from the endpoint process index, so it doesn't seem like there would be any null process names.

The json for my transform should ensure that org id and process name always exist, but I'm still getting the null error.

How can I best handle these errors so it doesn't kill my transform? Also, how would I change the mapping so its not a flattened field and I can aggregate via visualizations? -

value={
  "id": "brent-process6",
  "version": "8.3.3",
  "create_time": 1661870704986,
  "source": {
    "index": [
      "logs-endpoint.events.process*"
    ],
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "should": [
                {
                  "exists": {
                    "field": "organization.id"
                  }
                }
              ],
              "minimum_should_match": 1
            }
          },
          {
            "bool": {
              "should": [
                {
                  "exists": {
                    "field": "process.name"
                  }
                }
              ],
              "minimum_should_match": 1
            }
          }
        ]
      }
    }
  },
  "dest": {
    "index": "brent-process2"
  },
  "sync": {
    "time": {
      "field": "@timestamp",
      "delay": "60s"
    }
  },
  "pivot": {
    "group_by": {
      "organization.id": {
        "terms": {
          "field": "organization.id"
        }
      },
      "@timestamp": {
        "date_histogram": {
          "field": "@timestamp",
          "calendar_interval": "1h",
          "missing_bucket": true
        }
      }
    },
    "aggregations": {
      "process.name.terms": {
        "terms": {
          "field": "process.name"
        }
      }
    }
  },
  "settings": {},
  "retention_policy": {
    "time": {
      "field": "@timestamp",
      "max_age": "3d"
    }
  }
}

Output of GET _transform/brent-process6/_stats

{
  "count": 1,
  "transforms": [
    {
      "id": "brent-process6",
      "state": "failed",
      "reason": "Failed to index documents into destination index due to permanent error: [org.elasticsearch.xpack.transform.transforms.BulkIndexingException: Bulk index experienced [1] failures and at least 1 irrecoverable [org.elasticsearch.xpack.transform.transforms.TransformException: Destination index mappings are incompatible with the transform configuration.; org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [process.name.terms] of type [flattened] in document with id 'AEd3S3VLUMATGYPzlKmOr_-PAAAAAAAA'. Preview of field's value: 'null'; java.lang.IllegalArgumentException: field name cannot be an empty string].; org.elasticsearch.xpack.transform.transforms.TransformException: Destination index mappings are incompatible with the transform configuration.; org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [process.name.terms] of type [flattened] in document with id 'AEd3S3VLUMATGYPzlKmOr_-PAAAAAAAA'. Preview of field's value: 'null'; java.lang.IllegalArgumentException: field name cannot be an empty string]",
      "node": {
        "id": "KFZRFzWxTpOJ8ikSDhIJyQ",
        "name": "06-prd-iad-elasticsearch",
        "ephemeral_id": "gUJjyJx0RimMRHI2T5GPQw",
        "transport_address": "x.x.x.x:9300",
        "attributes": {}
      },
      "stats": {
        "pages_processed": 21,
        "documents_processed": 177835953,
        "documents_indexed": 10000,
        "documents_deleted": 0,
        "trigger_count": 1,
        "index_time_in_ms": 2825,
        "index_total": 20,
        "index_failures": 1,
        "search_time_in_ms": 2919750,
        "search_total": 21,
        "search_failures": 0,
        "processing_time_in_ms": 70,
        "processing_total": 21,
        "delete_time_in_ms": 0,
        "exponential_avg_checkpoint_duration_ms": 0,
        "exponential_avg_documents_indexed": 0,
        "exponential_avg_documents_processed": 0
      },
      "checkpointing": {
        "last": {
          "checkpoint": 0
        },
        "next": {
          "checkpoint": 1,
          "position": {
            "indexer_position": {
              "@timestamp": 1651143600000,
              "organization.id": "xxxxxx"
            }
          },
          "checkpoint_progress": {
            "docs_remaining": 4519755623,
            "total_docs": 4697591576,
            "percent_complete": 3.785683581104923,
            "docs_indexed": 10500,
            "docs_processed": 177835953
          },
          "timestamp_millis": 1661961740994,
          "time_upper_bound_millis": 1661961600000
        },
        "changes_last_detected_at": 1661961740988,
        "last_search_time": 1661961740988
      }
    }
  ]
}

bm11100 · August 31, 2022, 5:27pm

@Hendrik_Muhs I know you are the expert

bm11100 · August 31, 2022, 10:08pm

UPDATE - I tried adding a runtime field to the transform to account for the null errors but it is still failing.

{
	"process_name": {
		"type": "keyword",
		"script": {
			"source": "if (doc.containsKey('process.name') && !doc['process.name'].empty() && doc['process.name'].value == '') { emit('emptystring'); }"
		}
	}
}

I also tried to set the mapping to a keyword, instead of flattened before starting the transform and that ended up failing as well

Failed to index documents into destination index due to permanent error: [org.elasticsearch.xpack.transform.transforms.BulkIndexingException: Bulk index experienced [500] failures and at least 1 irrecoverable [org.elasticsearch.xpack.transform.transforms.TransformException: Destination index mappings are incompatible with the transform configuration.; org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [process.name.terms] of type [keyword] in document with id 'ADBoIkmrtMuFkDFCSy1p6Pt1AAAAAAAA'. Preview of field's value: '{dllhost={exe=2}, GoogleUpdate={exe=2}}'; java.lang.IllegalStateException: Can't get text on a START_OBJECT at 1:29].; org.elasticsearch.xpack.transform.transforms.TransformException: Destination index mappings are incompatible with the transform configuration.; org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [process.name.terms] of type [keyword] in document with id 'ADBoIkmrtMuFkDFCSy1p6Pt1AAAAAAAA'. Preview of field's value: '{dllhost={exe=2}, GoogleUpdate={exe=2}}'; java.lang.IllegalStateException: Can't get text on a START_OBJECT at 1:29]

Hendrik_Muhs · September 1, 2022, 1:58pm

Can you try filtering in the terms agg:

"aggregations": {
      "process.name.terms": {
        "terms": {
          "field": "process.name",
          "exclude": ""
        }
      }

The root cause might be an empty key, e.g. something like this:

        {
          "key": "",
          "doc_count": 2
        },

Transform creates a flattened field from the agg output, however empty strings aren't allowed. A similar issue: Transform jobs can fail if there's a \0 in fields where we perform terms aggregation (stored in flattened fields) · Issue #75875 · elastic/elasticsearch · GitHub

Let me know if it works, so I can follow up with an issue.

bm11100 · September 1, 2022, 3:56pm

@Hendrik_Muhs I added the exclude to the transform and it is still running without failing thus far, so it seems like that is working, thanks!

The flattened field type is still showing as Unknown within Kibana even though its mapped and listed as searchable and aggregable. I'm unable to Visualize on an unknown field type. Is there any way I can re-map it or should I be able to Visualize on a flattened field as well? I just want to show a graph of the top process names and counts from the agg'd transform.

bm11100 · September 6, 2022, 9:49pm

I've added "exclude": [ "", "." ] to my transform and it is no longer failing.

The flattened mapping is still showing as Unknown within Kibana, thus no way for me to create a visualization. An example of a document is below. Is there something else I would need to do to split up the data so I can visualize it @Hendrik_Muhs?

[
  {
    "_index": "brent-process21",
    "_id": "ADAO3_01CiX9cU5wsMOIdRQ5AAAAAAAA",
    "_version": 1,
    "_score": 0,
    "_source": {
      "process": {
        "name": {
          "terms": {
            "SearchFilterHost.exe": 77,
            "smartscreen.exe": 41,
            "SearchProtocolHost.exe": 88,
            "PING.EXE": 1040,
            "backgroundTaskHost.exe": 123,
            "conhost.exe": 54,
            "svchost.exe": 257,
            "SenseCncProxy.exe": 88,
            "cmd.exe": 120,
            "identity_helper.exe": 48
          }
        }
      },
      "@timestamp": "2022-09-06T20:00:00.000Z",
      "x": {
        "organization": {
          "id": "xxxxxx"
        }
      }
    },
    "fields": {
      "organization.id": [
        "xxxx"
      ],
      "@timestamp": [
        "2022-09-06T20:00:00.000Z"
      ],
      "process.name.terms": [
        {
          "SearchFilterHost.exe": 77,
          "smartscreen.exe": 41,
          "SearchProtocolHost.exe": 88,
          "PING.EXE": 1040,
          "backgroundTaskHost.exe": 123,
          "conhost.exe": 54,
          "svchost.exe": 257,
          "SenseCncProxy.exe": 88,
          "cmd.exe": 120,
          "identity_helper.exe": 48
        }
      ]
    }
  }
]

Hendrik_Muhs · September 12, 2022, 1:52pm

Sorry for the late reply, I was on vacation.

I guess you want to visualize e.g. process.name.terms.cmd.exe ? For this the flattened data type does not provide the necessary granularity. If not provided upfront transform creates mappings best effort, e.g. a terms aggregation gets mapped to flattened, because we don't know how many different field names are in the data. The number of fields in an index is limited. However you can customize / overrule transform by creating the destination index upfront or by creating an index template and disabling mapping deduction (see deduce_mappings).

If you want to visualize the individual process names every possible field should be mapped to a numeric field, e.g. long. One way to achieve this is dynamic template:

PUT brent-process21
{
  "mappings": {
    "dynamic_templates": [
      {
        "full_name": {
          "path_match": "process.name.terms.*",
          "mapping": {
            "type": "long"
          }
        }
      }
    ]
  }
}

Such a mapping creates a new mapping whenever a new field name appears and it maps it to a long.

Caveat 1:

If you choose to create the mappings yourself, you must do this for all fields, not just the ones you overwrite. That means e.g. you need to map @timestamp to date and organization.id to keyword. If you are unsure about the mappings you can use the transform preview API to see the choices that transform would make if it creates the destination index.

Caveat 2:

Due to the creation of many sub fields below process.name.terms instead of 1 flattened field you do not only increase memory and storage requirements, but you also might run into a mapping limit (so called "mapping explosion"). The default limit is 1000, you can increase it:

PUT brent-process21
{
  "mappings": {
  ...
  },
  "settings": {
    "index.mapping.total_fields.limit": 20000
  }
}

Still, at some point you might run into the limit if your keys are arbitrary. If you only care about certain keys, you could create only mappings for those and let elasticsearch ignore the others. For your usecase it also might work if you create daily indices, that way you can have x mappings per day instead of a total number of mappings.

bm11100 · September 12, 2022, 2:56pm

Thanks @Hendrik_Muhs, that's very helpful.

Are Transforms the most ideal solution to just aggregate on a single field and then visualize it? Is there a better option?

We decided not to go with Rollups due to them having no retention policy and no ability to roll-off the data.

The main use case for these is to speed up the time it takes for our visualizations to load, and only look through the aggregated data versus all the data in the documents from the endpoint index.

Hendrik_Muhs · September 12, 2022, 3:18pm

It still seems like a good solution to me. Maybe you can elaborate a bit more on the use case and why you choose the terms aggregation. Is the count important or do you only need to know if something appears or not?

bm11100 · September 12, 2022, 3:25pm

@Hendrik_Muhs the count is important. We are basically trying to get our visualization load times faster.

For a partner, we may want to showcase what their top process names were, or their top registry values, etc.

I basically want a count of the top values, per field I choose. The fields I am interested in are the terms fields unless there's another way.

Hendrik_Muhs · September 13, 2022, 11:12am

Thanks for the details. I have 2 more ideas:

Vega

Elasticsearch lets you write custom visualization using Vega. This might be useful to visualize the flattened data. However I am not an expert in this. The challenge seems to be to get the data into the right shape, that's why I looked into another option:

Scripted metric

I think the main problem is the representation of the data. A terms agg writes the result as

  "SearchFilterHost.exe": 77,
  "smartscreen.exe": 41,
...

I wrote a scripted metric to instead output this as:

[
  {"key":"SearchFilterHost.exe", "value": 77},
  {"key":"smartscreen.exe", "value": 41},
...
]

You can use the following aggregation instead of your terms agg.

      "process.name.terms": {
        "scripted_metric": {
          "init_script": "state.map=new HashMap()",
          "map_script": """def key = doc['process.name'].value;
                           if (state.map.containsKey(key)) {
                             state.map.put(key, state.map.get(key) + 1); 
                           } else {
                             state.map.put(key, 1)
                           }""",
          "combine_script": "return state",
          "reduce_script": """def list = new ArrayList();
                             def joinedMap = states.stream().flatMap(s -> s.map.entrySet().stream())
       .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (a,b)-> a+b));
       
                              for (s in joinedMap.entrySet()) {
                                def e = new HashMap(); 
                                e.put('key', s.getKey()); 
                                e.put('value', s.getValue()); 
                                list.add(e);
                              } 
                              return list;"""
        }
      }

Note: transform does not create mappings for scripted metric, that means the data gets dynamically mapped. You might want to write your own mappings instead.

bm11100 · September 13, 2022, 8:05pm

Thanks @Hendrik_Muhs, that is working for me! Is there a way to add up the value count associated with the field? For example, if I now group by org id, it will not associate the value.

Hendrik_Muhs · September 14, 2022, 6:57am

I hope I understand your question correctly.

You can run aggregations on the transform destination index[*], so although you grouped by org id, you can summarize the counts the same way you can summarize over time buckets.

If that doesn't help: can you post your current transform config[**]?

[*] That's the main difference to what we started with, the terms/flattened field combination wasn't aggregate-able and hence could not be visualized.
[**] We can also go over the support channel, support can help better than me when it comes to visualizations

system · October 12, 2022, 6:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transform has failed; experienced: [task encountered irrecoverable failure: field name cannot be null.] Elasticsearch	4	1380	June 26, 2020
Date format changes and causes an error in a transform Elasticsearch transforms	5	994	March 31, 2022
MapperParsingException[Failed to parse mapping...] Elasticsearch	5	5714	July 18, 2017
Org.elasticsearch.index.mapper.MapperParsingException: failed to parse - need guidance Elasticsearch	2	1540	July 6, 2017
Org.elasticsearch.index.mapper.MapperParsingException: failed to parse -- NEED HELP Elasticsearch	2	5208	July 6, 2017

Transform Mapping Errors

Related topics