Fields and Row template in Annotations [TSVB]

Oskr · May 7, 2021, 3:08pm

Hi.
I am using TSVB to visualize the data stored in the machine learning index, I have seen that I can use annotations to visualize the details of the anomaly but when I try to add the causes.typical field it does not generate the annotations in the visualization.
Data:

"_source": {
    "job_id": "customers_fraud",
    "result_type": "record",
    "probability": 0.0017884393731519299,
    "record_score": 0.4412245161339784,
    "initial_record_score": 0.4412245161339784,
    "bucket_span": 3600,
    "detector_index": 0,
    "is_interim": true,
    "timestamp": 1620396000000,
    "function": "count",
    "function_description": "count",
    "over_field_name": "customer_id.keyword",
    "over_field_value": "123456789",
    "causes": [
      {
        "probability": 0.0017884393731519297,
        "function": "count",
        "function_description": "count",
        "typical": [
          4.239593819083309
        ],
        "actual": [
          19
        ],
        "over_field_name": "customer_id.keyword",
        "over_field_value": "123456789"
      }
    ],
    "influencers": [
      {
        "influencer_field_name": "customer_id.keyword",
        "influencer_field_values": [
          "123456789"
        ]
      },
      {
        "influencer_field_name": "payment_method.keyword",
        "influencer_field_values": [
          "CREDIT"
        ]
      },
      {
        "influencer_field_name": "currency.keyword",
        "influencer_field_values": [
          "USD"
        ]
      }
    ],
    "customer_id.keyword": [
      "123456789"
    ],
    "currency.keyword": [
      "USD"
    ],
    "payment_method.keyword": [
      "CREDIT"
    ]
  },
  "fields": {
    "timestamp": [
      "2021-05-07T14:00:00.000Z"
    ]
  },
  "sort": [
    1620396000000
  ]
}

Marius_Dragomir · May 10, 2021, 11:49am

Hi Oscar,

These is an array of causes so I think that using causes.typical won't address any field from there. It would have to be something like causes[0].typical[0] You can try with an without the [0] for typical. I do have doubts that this is a supported scenario, so if you come back and say that it doesn't work like that for you, then I'll open an issue for it in the Kibana repo.

richcollier · May 10, 2021, 12:01pm

I have reproduced the situation for Oscar and I've tried all kinds of combinations of causes.actual , causes.actual._value , causes.0.actual, and now causes[0].typical[0] but none seem to work.

I'm wondering if this is related to this bug/enhancement: TSVB Make mustache template field accessors consistent · Issue #59435 · elastic/kibana · GitHub

Marius_Dragomir · May 10, 2021, 12:05pm

Yeah, I think we need to create an issue for it then, just specifically and then it can be linked to a meta issue for more improvements.

richcollier · May 10, 2021, 2:52pm

I think the issue is primarily because causes is a nested object....

richcollier · May 10, 2021, 2:56pm

@Oskr - A possible workaround could be the usage of Transforms. In particular, you could use Transforms to re-format the .ml-anomalies-* index into a new, very small index for reporting purposes for TSVB. For example:

PUT _transform/my_ml_annotations
{
  "source": {
    "index": [
      ".ml-anomalies-*"
    ],
    "query": {
      "bool": {
        "filter": [
          {
            "term": {
              "result_type": "record"
            }
          },
          {
            "term": {
              "job_id": "url_scanning"
            }
          },
          {
            "range": {
              "record_score": {
                "gte": "99"
              }
            }
          }
        ]
      }
    }
  },
  "dest": {
    "index": "my_ml_annotations"
  },
  "pivot": {
    "group_by": {
      "timestamp": {
        "date_histogram": {
          "field": "timestamp",
          "fixed_interval": "15m"
        }
      },
      "clientip": {
        "terms": {
          "field": "clientip"
        }
      }
    },
    "aggregations": {
      "record_score": {
        "max": {
          "field": "record_score"
        }
      },
      "typical": {
        "scripted_metric": {
          "init_script": "state.typical = null",
          "map_script": "state.typical = params._source.causes.0.typical.0",
          "combine_script": "return state.typical",
          "reduce_script": "for (d in states) if (d != null) return d"
        }
      },
      "actual": {
        "scripted_metric": {
          "init_script": "state.actual = null",
          "map_script": "state.actual = params._source.causes.0.actual.0",
          "combine_script": "return state.actual",
          "reduce_script": "for (d in states) if (d != null) return d"
        }
      }
    }
  }
}

The above will create a new index called my_ml_annotations that is "flattened" and looks like the following:

Then, I can use it in TSVB:

richcollier · May 10, 2021, 2:59pm

Of course, you'd need to run the transform "continuously" by defining the frequency and sync:

flash1293 · May 10, 2021, 3:00pm

This is possible, but the correct syntax is indeed very hard to hit:

The fields list only has to mention causes, then in the row template you can get it using this syntax: {{causes.[0].actual.[0]}}

It's weird, I know, but it correctly picks the 19 value for the tooltip

Oskr · May 10, 2021, 4:52pm

Hi @flash1293 thanks for your help, Can you confirm the version of Kibana you are working on?
I tried to do the visualization in the way you describe and it generated the same problem.

Oskr · May 10, 2021, 4:56pm

Thanks @richcollier I think it is a good solution although I don't know if it will generate an overload to the cluster in the future. I will propose it to our team and test it.

richcollier · May 10, 2021, 4:59pm

@Oskr - I'm not sure @flash1293 's solution works specifically with the .ml-anomalies-* index because the way that it is mapped., but we can wait on his clarification (I couldn't get his suggestion to work either). I suspect his test didn't actually use the true .ml-anomalies-* index, but rather a mock-up.

My workaround using Transforms will be incredibly lightweight. Transforms just uses elasticsearch aggregations under the hood and the .ml-anomalies-* index that it is operating on shouldn't be that big in the first place!

Oskr · May 10, 2021, 4:59pm

Thanks @richcollier I tried to solve the problem with your tips but they didn't work.

Oskr · May 10, 2021, 5:01pm

Hi @Marius_Dragomir yes the problem is when trying to generate annotations from arrays.
I tried to solve the problem with your tips but they didn't work.

flash1293 · May 11, 2021, 8:07am

I'm no ML expert, maybe something special is going on there. TSVB is simply reading the _source from the document, so if the source is available, is should work. Maybe ml anomalies are not storing this part of the source?

The part I'm sure about is the mustache syntax for accessing the first value of an array is path.[0] (not path.0 or path[0] which would make more sense)

richcollier · May 11, 2021, 4:32pm

Yes, ML stores the causes array in _source with a mapping type of nested so there must be something else going on here.

richcollier · May 12, 2021, 12:47pm

It turns out this is exactly the situation. The causes array in _source with a mapping type of nested messes up TSVB - as TSVB executes an exists filter on the data:

        {
          "exists": {
            "field": "causes"
          }
        }

which returns nothing.

For now, stick with the transforms workaround. There could also be a possibility of also using a runtime field for items buried in the causes array but I have yet to test that (and perhaps TSVB doesn't support runtime fields until 7.13)

Oskr · May 13, 2021, 5:25pm

Thank you @richcollier for the explanation.

system · June 10, 2021, 5:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.