Latest transform fails to fetch preview after running at 100% CPU for more than an hour

Hello

I am trying to create a transform from a filebeat 7.15 index on a 7.17 cluster with 4 nodes and sufficient RAM and CPU's. During normal operations the cluster runs smoothly with low CPU usages. In fact the setup is the same on 3 clusters in 3 different environments. The error happens on every cluster.

The problem is all nodes bumps to 100% CPU load for more than an hour when the transform tries to create the preview and the cluster become un-responsive for the same period. Sometimes the preview fails with the message "Fail to fetch the preview". A few times the preview is created, and I can configure the unique keys and sort field. However, when I create the transform without starting the job and inspects the query it is wrong.

The query is pretty simple and works well as both KQL and json. As you can see the fields as ECS. Labels are dynamic mappings and present in the transform index templates.

KQL:

event.dataset :("holodeck.ap.log" or "holodeckb2b.log" ) and labels.type :("usermessage_incoming" or "usermessage_outgoing" or "receipt_incoming" or "receipt_incoming" )

json:

GET filebeat-*/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "event.dataset": [
              "holodeck.ap.log",
              "holodeckb2b.log"
            ]
          }
        }
      ],
      "should": [
        {
          "terms": {
            "labels.type": [
              "usermessage_incoming",
              "usermessage_outgoing",
              "receipt_incoming",
              "receipt_outgoing"
            ]
          }
        }
      ]
    }
  }
}

The few time the transform has been created the query may be changed to a match_all() query or only event.dataset query is present.

Most times the creation fails after run at 100% CPU.

I have read all 8.5 documentation and search this forum. All help is appreciated.

A question more which the documentation does not answer: When creating the transform index template do I have to keep all fields from the original filebeat template or is it sufficient to create a template with the transformed fields, i.e. the fields returned in a document with the query above?

Best regards
Flemming

How many filebeat indices do you have?
We are currently looking into similar issues, due to a bug preview continues in the background although it timed out for the client. This has been fixed for the upcoming 7.17.8, 8.5.2 and 8.6.

If you have a lot of historic data, you should consider limiting the source using a query, e.g. via a range query that adds an absolute start date.

Can you post your config for your latest transform?

Can you elaborate: Are you using the transform UI or dev console? Does PUT timeout or a preview call?

A latest transform returns documents as they are in the source index. Therefore we advise to use the same template. If you have a lot of unused fields, you can indeed remove them, however it doesn't make a big difference from a technical perspective. If you look for ways to save storage, consider disabling _source on the destination index of your transform.

Hi @Hendrik_Muhs

Thanks for taking your time to help me answer these questions. I have continued to experiment with both pivot and latest transforms on this index. T

In the production 35 filebeat indices of various size - in the test environments considerably less. Only some of the indices contains documents with documents to be transformed. I have observed, too, the preview task runs in the background after a client timeout. I am planning to upgrade to version 8.x within a month. I can upgrade to 7.17.8 when it is released.

I have also considered to do this. However, neither the privot nor the latest transform accepts queries from Kibana. I have tried both KQL and json queries. When the transform has been created the queries end up as a match_all() query. The queries work flawless in Discoverer and DevTools. When I select the edit json query switch I can write the word "query" and the Apply Changes button becomes activated, but when I add the query it is becomes inactive. I cannot find any examples of a json query which are accepted from Kibana.

Until now I have only tried to create the transform from Kibana.

Transform UI

I have found the answer to my last question. For production it is best to create an index template for the transformed index.

BR Flemming

Thanks, for the answers.

If you edit the transform to add a query, you can only use json queries, because transform runs on the backend, namely elasticsearch. K(ibana) Q(uery) L(anguage) isn't supported if you edit it directly. You might be able to create a data view in kibana and than create the transform from that data view. I will check that.

Another alternative: Start with the UI and go through the wizard. At the end you have the option to continue in dev console. There you can add your query. If that's not working for you, please share your config and somebody can help with the syntax.

Thanks for the suggestions @Hendrik_Muhs

I'll try what works and post the results. It not a problem to use the transforms API's as you suggests. However, I hope the documentation will be elaborated in upcoming releases.