We are running elasticsearch 7.6.1 on linux nodes. Our cluster is about 50 nodes.
I have 3 indexes which hold the results of dataframe transforms that aggregate some details on IP addresses across 2 different source indexes. 2 transforms run on the same index, looking at 2 different IP fields, and 1 transform runs on another index of IP data. The simplest version of the output is IP address, first seen, last seen (generated by min and max aggregations on a timestamp field).
Each of the destination indexes for these 3 transforms has the same mapping and are grouped under an alias.
I would like to create a final transform that combines the results of the first 3 transforms by again grouping by IP in the pivot. I have not been able to successfully create any version of this second transform. Even creating the simplest transform that groups on the IP and does a count, fails with the following error:
{
"statusCode": 500,
"error": "Internal Server Error",
"message": "An internal server error occurred",
"cause": [
"Index 1 out of bounds for length 1",
"Index 1 out of bounds for length 1"
]
}
It isn't at all clear to me what this means. Is there some limitation to using the output of one transform as the input for another?
When I try to preview the transform from the developer console in Kibana I get more error...but its not any more helpful
{
"error" : {
"root_cause" : [
{
"type" : "array_index_out_of_bounds_exception",
"reason" : "Index 1 out of bounds for length 1"
},
{
"type" : "array_index_out_of_bounds_exception",
"reason" : "Index 1 out of bounds for length 1"
},
{
"type" : "array_index_out_of_bounds_exception",
"reason" : "Index 1 out of bounds for length 1"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "<index_1_v1>",
"node" : "uQK84GT3TN-Am4mzRUWzOw",
"reason" : {
"type" : "array_index_out_of_bounds_exception",
"reason" : "Index 1 out of bounds for length 1"
}
},
{
"shard" : 0,
"index" : "<index_2_v1>",
"node" : "qVFvxnRJTkSmvhTCEFOWTw",
"reason" : {
"type" : "array_index_out_of_bounds_exception",
"reason" : "Index 1 out of bounds for length 1"
}
},
{
"shard" : 0,
"index" : "<index_3_v1>",
"node" : "-K5Vg16BRZmBTsQG8v7T2Q",
"reason" : {
"type" : "array_index_out_of_bounds_exception",
"reason" : "Index 1 out of bounds for length 1"
}
}
],
"caused_by" : {
"type" : "array_index_out_of_bounds_exception",
"reason" : "Index 1 out of bounds for length 1",
"caused_by" : {
"type" : "array_index_out_of_bounds_exception",
"reason" : "Index 1 out of bounds for length 1"
}
}
},
"status" : 500
}
I have replaced the actual index names in the output.
The final consolidation step is not terribly difficult, but it turns our application code from a simple index query into another aggregation. (An aggregation on the data works just fine)
GET <transform_index_alias>/_search?size=0
{
"aggs": {
"ip": {
"terms": {
"field": "ip.addr",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "ip.addr"
}
}
}
}
}
}
Produces the expected results:
{
"took" : 5369,
"timed_out" : false,
"_shards" : {
"total" : 15,
"successful" : 15,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ip" : {
"doc_count_error_upper_bound" : 15,
"sum_other_doc_count" : 6217815,
"buckets" : [
{
"key" : "0.0.0.0",
"doc_count" : 2,
"count" : {
"value" : 2
}
},
{
"key" : "0.0.0.1",
"doc_count" : 1,
"count" : {
"value" : 1
}
},
{
"key" : "0.0.0.2",
"doc_count" : 1,
"count" : {
"value" : 1
}
},
{
"key" : "0.0.0.3",
"doc_count" : 1,
"count" : {
"value" : 1
}
},
{
"key" : "0.0.0.4",
"doc_count" : 1,
"count" : {
"value" : 1
}
}
]
}
}
}