Composite query and filter agregation on a .ml-anomalies index

Incauto · September 10, 2020, 4:14pm

HI, Im trying to query the ML anomalies index with a composite query and a filter aggregation, but the filter aggs is not working, I want to get a specific job but it gets me others jobs in the index.

Any ideas what is wrong with my query?

GET .ml-anomalies-custom-myindex*/_search
{
  "size": 0,
  "aggs": {
    "table": {
      "composite": {
        "size": 10000,
        "sources": [
          {
            "job": {
              "terms": {
                "field": "job_id"
              }
            }
          },
          {
            "score": {
              "terms": {
                "field": "record_score"
              }
            }
          },
          {
            "date": {
              "date_histogram": {
                "field": "timestamp",
                "fixed_interval": "1d"
              }
            }
          }
        ]
      },
      "aggs": {
        "filtro": {
          "filter": {
            "term": {
              "job_id": "cpu-utilization"
            }
          }
        }
      }
    }
  }
}

richcollier · September 10, 2020, 7:28pm

Can you describe what it is that you want as the end result?

Incauto · September 10, 2020, 7:33pm

Hi, that all the fields in the composite query: "job","score","date". belong to the job "cpu-utilization"

richcollier · September 10, 2020, 7:57pm

Are you just trying to get a summary of anomaly scores per day per job?

I personally would do this with Transforms:

i.e.

POST _transform/_preview
{
  "source": {
    "index": [
      ".ml-anomalies-*"
    ],
    "query": {
            "bool": {
              "filter": [
                  { "term" :  { "job_id": "farequote_resp_by_airline" } }
              ]
            }
    }
  },
  "pivot": {
    "group_by": {
      "job_id": {
        "terms": {
          "field": "job_id"
        }
      },
      "timestamp": {
        "date_histogram": {
          "field": "timestamp",
          "calendar_interval": "1d"
        }
      }
    },
    "aggregations": {
      "record_score_max": {
        "max": {
          "field": "record_score"
        }
      }
    }
  }
}

which would yield:

{
  "preview" : [
    {
      "job_id" : "farequote_resp_by_airline",
      "record_score_max" : 1.891242,
      "timestamp" : 1486425600000
    },
    {
      "job_id" : "farequote_resp_by_airline",
      "record_score_max" : 6.176466,
      "timestamp" : 1486512000000
    },
    {
      "job_id" : "farequote_resp_by_airline",
      "record_score_max" : 98.5606570845504,
      "timestamp" : 1486598400000
    },
    {
      "job_id" : "farequote_resp_by_airline",
      "record_score_max" : 3.085973176835846,
      "timestamp" : 1486684800000
    },
    {
      "job_id" : "farequote_resp_by_airline",
      "record_score_max" : 10.01659340509018,
      "timestamp" : 1486771200000
    },
    ...

and you could remove the filter to see all jobs, if you wanted.

Incauto · September 11, 2020, 12:47am

Thanks Rich, I didn't know about transforms, but sadly I need the query to be used in Vega Visualizations, I could use the filter on the gui, but the programmer, who gets the visualizations via kibana API, needs that filter on the query.

richcollier · September 11, 2020, 1:27pm

Well, just so you know - Transforms allows you to create a new index. Then you can have your Vega visualization point to this new, summary index that the transform creates.

Incauto · September 11, 2020, 1:56pm

system · October 9, 2020, 1:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.