Error building transform on the destination index of previous transform

We are running elasticsearch 7.6.1 on linux nodes. Our cluster is about 50 nodes.

I have 3 indexes which hold the results of dataframe transforms that aggregate some details on IP addresses across 2 different source indexes. 2 transforms run on the same index, looking at 2 different IP fields, and 1 transform runs on another index of IP data. The simplest version of the output is IP address, first seen, last seen (generated by min and max aggregations on a timestamp field).

Each of the destination indexes for these 3 transforms has the same mapping and are grouped under an alias.
I would like to create a final transform that combines the results of the first 3 transforms by again grouping by IP in the pivot. I have not been able to successfully create any version of this second transform. Even creating the simplest transform that groups on the IP and does a count, fails with the following error:

{
  "statusCode": 500,
  "error": "Internal Server Error",
  "message": "An internal server error occurred",
  "cause": [
    "Index 1 out of bounds for length 1",
    "Index 1 out of bounds for length 1"
  ]
}

It isn't at all clear to me what this means. Is there some limitation to using the output of one transform as the input for another?

When I try to preview the transform from the developer console in Kibana I get more error...but its not any more helpful

{
  "error" : {
    "root_cause" : [
      {
        "type" : "array_index_out_of_bounds_exception",
        "reason" : "Index 1 out of bounds for length 1"
      },
      {
        "type" : "array_index_out_of_bounds_exception",
        "reason" : "Index 1 out of bounds for length 1"
      },
      {
        "type" : "array_index_out_of_bounds_exception",
        "reason" : "Index 1 out of bounds for length 1"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "<index_1_v1>",
        "node" : "uQK84GT3TN-Am4mzRUWzOw",
        "reason" : {
          "type" : "array_index_out_of_bounds_exception",
          "reason" : "Index 1 out of bounds for length 1"
        }
      },
      {
        "shard" : 0,
        "index" : "<index_2_v1>",
        "node" : "qVFvxnRJTkSmvhTCEFOWTw",
        "reason" : {
          "type" : "array_index_out_of_bounds_exception",
          "reason" : "Index 1 out of bounds for length 1"
        }
      },
      {
        "shard" : 0,
        "index" : "<index_3_v1>",
        "node" : "-K5Vg16BRZmBTsQG8v7T2Q",
        "reason" : {
          "type" : "array_index_out_of_bounds_exception",
          "reason" : "Index 1 out of bounds for length 1"
        }
      }
    ],
    "caused_by" : {
      "type" : "array_index_out_of_bounds_exception",
      "reason" : "Index 1 out of bounds for length 1",
      "caused_by" : {
        "type" : "array_index_out_of_bounds_exception",
        "reason" : "Index 1 out of bounds for length 1"
      }
    }
  },
  "status" : 500
}

I have replaced the actual index names in the output.

The final consolidation step is not terribly difficult, but it turns our application code from a simple index query into another aggregation. (An aggregation on the data works just fine)

GET <transform_index_alias>/_search?size=0
{
  "aggs": {
    "ip": {
      "terms": {
        "field": "ip.addr",
        "size": 10
      },
      "aggs": {
        "count": {
          "value_count": {
            "field": "ip.addr"
          }
        }
      }
    }
  }
}

Produces the expected results:

{
  "took" : 5369,
  "timed_out" : false,
  "_shards" : {
    "total" : 15,
    "successful" : 15,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ip" : {
      "doc_count_error_upper_bound" : 15,
      "sum_other_doc_count" : 6217815,
      "buckets" : [
        {
          "key" : "0.0.0.0",
          "doc_count" : 2,
          "count" : {
            "value" : 2
          }
        },
        {
          "key" : "0.0.0.1",
          "doc_count" : 1,
          "count" : {
            "value" : 1
          }
        },
        {
          "key" : "0.0.0.2",
          "doc_count" : 1,
          "count" : {
            "value" : 1
          }
        },
        {
          "key" : "0.0.0.3",
          "doc_count" : 1,
          "count" : {
            "value" : 1
          }
        },
        {
          "key" : "0.0.0.4",
          "doc_count" : 1,
          "count" : {
            "value" : 1
          }
        }
      ]
    }
  }
}

Hi,

this is a complicated use case, it probably takes some time to get to the root cause.

No, Transform writes into a destination index that is no different to any other index. If you push documents using the same schema and data into a new index it should fail the same way. I suspect the problem to be in the new combining transform.

This indicates a problem with the query that transform creates. Are you using a script, e.g. scripted_metric?

For those failures you might find more information in the logs.

Can you share the config of the combining transform and maybe some sample data?

@Hendrik_Muhs Thank you for the reply. I can also see that it must be something about the source index, but its unclear to me what the issue is, given that error message.

Yes, the previous aggregations use a scripted metric to get around the lack of a terms aggregation in 7.6. There is also a fair amount of processing that gets done in an ingest pipeline to clean up the data before its written.

I was just able to reproduce the error in a mock index with very simple input data, so it is definitely something with the existing index mapping. Once I removed all of the fields in the index mapping other than what was in my simplistic mock data, the transform worked. Now I need to start the slow process of adding back in fields and seeing when it breaks.

Ok, I think I have a test case for a bug, in 7.6.1 at least.
So, if I remove the index sorting on the source index, my test case appears to work.
This will break in my tests:

PUT dest
PUT source7
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "0",
      "sort" : {
        "field" : [
          "ip.addr",
          "first_seen"
        ],
        "order" : [
          "asc",
          "desc"
        ]
      }
    }
  },
  "mappings": {
    "dynamic": "false",
    "properties": {
      "ip": {
        "dynamic": false,
        "type": "object",
        "properties": {
          "addr": {
            "ignore_malformed": true,
            "type": "ip"
          }
        }
      },
      "first_seen": {
        "type": "date"
      }
    }
  }
}

PUT source7/_doc/1
{
  "ip.addr": "204.14.32.12",
  "first_seen" : "2020-12-15T10:13:45.123Z"
}


PUT source7/_doc/2
{
  "ip.addr": "201.14.32.12",
  "first_seen" : "2020-12-10T10:13:45.123Z"
}

POST _transform/_preview
{
  "source": {
    "index": [
      "source7"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "dest"
  },
  "pivot": {
    "group_by": {
      "agg_ip": {
        "terms": {
          "field": "ip.addr"
        }
      }
    },
    "aggregations": {
      "first_seen": {
        "min": {
          "field": "first_seen"
        }
      }
    }
  }
}

When running that series of commands, I get the following output from the transform preview:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "array_index_out_of_bounds_exception",
        "reason" : "Index 1 out of bounds for length 1"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "source7",
        "node" : "EOxgQEUBR92NO6ToC_JTOA",
        "reason" : {
          "type" : "array_index_out_of_bounds_exception",
          "reason" : "Index 1 out of bounds for length 1"
        }
      }
    ],
    "caused_by" : {
      "type" : "array_index_out_of_bounds_exception",
      "reason" : "Index 1 out of bounds for length 1",
      "caused_by" : {
        "type" : "array_index_out_of_bounds_exception",
        "reason" : "Index 1 out of bounds for length 1"
      }
    }
  },
  "status" : 500
}

Good case:

PUT source8
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "0"
    }
  },
  "mappings": {
    "dynamic": "false",
    "properties": {
      "ip": {
        "dynamic": false,
        "type": "object",
        "properties": {
          "addr": {
            "ignore_malformed": true,
            "type": "ip"
          }
        }
      },
      "first_seen": {
        "type": "date"
      }
    }
  }
}

PUT source8/_doc/1
{
  "ip.addr": "204.14.32.12",
  "first_seen" : "2020-12-15T10:13:45.123Z"
}


PUT source8/_doc/2
{
  "ip.addr": "201.14.32.12",
  "first_seen" : "2020-12-10T10:13:45.123Z"
}

POST _transform/_preview
{
  "source": {
    "index": [
      "source8"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "dest"
  },
  "pivot": {
    "group_by": {
      "ip": {
        "terms": {
          "field": "ip.addr"
        }
      }
    },
    "aggregations": {
      "first_seen": {
        "min": {
          "field": "first_seen"
        }
      }
    }
  }
}

For the transform preview here, I get the expected output:

{
  "preview" : [
    {
      "first_seen" : "2020-12-10T10:13:45.123Z",
      "ip" : "201.14.32.12"
    },
    {
      "first_seen" : "2020-12-15T10:13:45.123Z",
      "ip" : "204.14.32.12"
    }
  ],
  "mappings" : {
    "properties" : {
      "first_seen" : {
        "type" : "date"
      },
      "ip" : {
        "type" : "ip"
      }
    }
  }
}

Here is the docker compose file I used to create the test env:

version: "3"

services:

  elasticsearch:
    image: elasticsearch:7.6.1
    ports:
      - "9200:9200"
    environment:
      - discovery.type=single-node

  kibana:
    image: kibana:7.6.1
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_URL=http://elasticsearch:9200

I just realized I had some useful stacktrace data from the docker logs:

Sorry, the stacktrace made my post too long. I'll keep the logs. Let me know if you have an issue getting it. It seems to point pretty clearly at where the sorter was causing the problem.

For now it looks like I just need to update my intermediate index to remove the sorting and I should be able to move forwards.

EDIT:
I just updated my intermediate indexes, removing the sorting, and the final transform now works as expected.

As a final follow up, this issue does not appear in 7.11.0. Each case produces the expected aggregation preview.

Thanks!

I also run it in my test environment and it seems to have been fixed in a later version, my best guess is: Composite aggregation must check live docs when the index is sorted by jimczi · Pull Request #63864 · elastic/elasticsearch · GitHub

@Hendrik_Muhs Thanks for the help on this. I removed the index sorting on my intermediate indexes and things are running smoothly now. Now we just need to upgrade to 7.11 :slight_smile: