Transforms from first principles

Hi,

I've got an 8.8.1 cluster which is functioning well and ingesting data into a hot/warm architecture.

I have a number of users with some pretty heavy dashboards and, to lighten the load on the cluster, I thought it might be a good idea to use a transform to summarise some of the data on a schedule and use that summary data as the basis for the dashboards. To minimise impact on the cluster, I have added a dedicated transform-only node.

The data in the index is structured log data, broken out into mapped fields within a document, and with associated timestamps. This is stored in an ILM-managed data stream. There are around 4bn documents in the datastream, on a 14-day lifecycle policy.

I have created a continuous transform which filters out certain fields, and then groups by timestamp (calendar interval 1 minute) and a couple of other fields, and then aggregates certain fields to produce e.g. count of log messages received and cardinality of another field.

I chose calendar_interval of 1 minute as I want the transform to produce minute-by-minute data.

The data can be slightly delayed between generation and indexing, so I have added a delay of 600s to the sync, and configured the frequency to 60s.

I have created an index template with mappings for the fields which will be indexed into the transform's destination, and have created the index.

In the preview, the data looks exactly as want it, so I created and started the transform.

A few minutes after starting the transform, I begin to see documents in the destination index albeit appearing at a substantially slower rate than I would have expected (~1000 per minute). Not long after this, the CPU util on the warm nodes goes to 100%, search query latency goes to the 100's of thousands of ms, ingest into the hot nodes ceases (!!), and the dedicated transform node drops out of the cluster.

Things do not recover until I restart the ES process on the transform node, and stop the transform.

This is not what I had in mind, and can only assume that I have messed something up. I've pasted the transform json below, and would appreciate any and all advice.

Thanks!

{
  "id": "transform-summary",
  "authorization": {
    "roles": [
      "a role with the right permissions"
    ]
  },
  "version": "8.8.1",
  "create_time": 1691924966868,
  "source": {
    "index": [
      "lovelyindex*"
    ],
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "filter": [
                {
                  "bool": {
                    "must_not": {
                      "bool": {
                        "should": [
                          {
                            "term": {
                              "unwanted_field_1": {
                                "value": "not wanted"
                              }
                            }
                          }
                        ],
                        "minimum_should_match": 1
                      }
                    }
                  }
                },
                {
                  "bool": {
                    "must_not": {
                      "bool": {
                        "should": [
                          {
                            "term": {
                              "unwanted_field_2": {
                                "value": "definitely not wanted"
                              }
                            }
                          }
                        ],
                        "minimum_should_match": 1
                      }
                    }
                  }
                }
              ]
            }
          }
        ]
      }
    }
  },
  "dest": {
    "index": "transform-summary-data"
  },
  "frequency": "60s",
  "sync": {
    "time": {
      "field": "@timestamp",
      "delay": "600s"
    }
  },
  "pivot": {
    "group_by": {
      "@timestamp": {
        "date_histogram": {
          "field": "@timestamp",
          "calendar_interval": "1m"
        }
      },
      "user_name": {
        "terms": {
          "field": "user.name"
        }
      },
      "user_category": {
        "terms": {
          "field": "user.category"
        }
      },
      "user_vip": {
        "terms": {
          "field": "user.vip"
        }
      }
    },
    "aggregations": {
      "total_requests": {
        "value_count": {
          "field": "field_which_is_always_present"
        }
      },
      "distinct_categories": {
        "cardinality": {
          "field": "categories"
        }
      },
      "distinct_thing": {
        "cardinality": {
          "field": "thing"
        }
      }
    }
  },
  "settings": {
    "max_page_search_size": 500
  }
}

my .02. without logs hard to tell. I've looked at your query and there seems to be optimization opportunities. However, curious if you have tried to increase the transform frequency from 60s to several minutes, if that makes any difference?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.