Ingest pipeline, rename a field that has a dynamic part

hi,
I have a metrics with fields which look like that:

prometheus.xyz-dev.label.name
prometheus.xyz-dev.tomcat_global_error_total.value
prometheus.xyz-dev.tomcat_global_received_bytes_total.value
prometheus.xyz-dev.tomcat_global_request_max_seconds.value
prometheus.xyz-dev.tomcat_global_request_seconds.count

I would like to delete first two parts, but the second one is a kubernetes.namespace value so this is a dynamic value.
This is what i need:
prometheus.xyz-dev.label.name => label.name
prometheus.abc-prod.label.name => label.name

or eventually
prometheus.xyz-dev.label.name => prometheus.label.name

I need to create an ingest pipeline which can to that, but i have no idea how to do that. Can you help?

Ok, I partly resolve that problem, I created two pipelines, the first one checks metricset.module and runs another pipeline which changes fields name. But I have to define each field name (right now its more than 60), this is not a generic solution. Do you have any idea how can i do that?

first pipeline:

  {
    "set" : {
      "field" : "metricset_module",
      "value" : "{{metricset.module}}",
      "ignore_failure" : true
    }
  },
  {
    "pipeline" : {
      "name" : "prometheus_pipeline",
      "if" : "ctx.containsKey('metricset_module')&&ctx.metricset_module!=''"
    }
  },
  {
    "remove" : {
      "field" : [
        "metricset_module"
      ],
      "ignore_failure" : true
    }
  }

and

  "prometheus_pipeline" : {
     "description" : "",
     "processors" : [
       {
         "rename" : {
           "if" : "ctx.metricset.module=='prometheus'",
           "field" : "prometheus.{{kubernetes.namespace}}.http_server_requests_seconds.count",
           "target_field" : "prometheus.http_server_requests_seconds.count",
           "ignore_failure" : true
         }
       },
       {
         "rename" : {
           "if" : "ctx.metricset.module=='prometheus'",
           "field" : "prometheus.{{kubernetes.namespace}}.http_server_requests_seconds.sum",
           "target_field" : "prometheus.http_server_requests_seconds.sum",
           "ignore_failure" : true
         }
       },
       {
         "rename" : {
           "if" : "ctx.metricset.module=='prometheus'",
           "field" : "prometheus.{{kubernetes.namespace}}.http_server_requests_seconds_max.value",
           "target_field" : "prometheus.http_server_requests_seconds_max.value",
           "ignore_failure" : true
         }
       },
 ...etc

Hi there!

Are the first two parts (prometheus.xyz-dev and prometheus.abc-prod) always the same?

I mean, are they static and equivalent to a limited set of values or can they change randomly?

Also, please paste here a sample of input documents and tell us if by any chance you're using logstash in your stack to ingest data or if you're ingesting them directly into elastic from another source.

Thanks

First part so "prometheus" is static, but the next one not. This is the name of namespace from kubernetes, so we have a lot of namespaces and we will have a lot of new. And then there is a name of metric. As you can see in my second message, i did it (delete second part - namespace) but i had to write rename pipeline for each metricname. There are currently about 60 metric names, but there may be more in the future. So i would like to do sth dynamic
We skip the logstash, flow is from metricbeat to elasticsearch

Ok so, if the first part is always prometheus then you can do something like:

POST _ingest/pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "old_field": "prometheus.whatever.my_app"
      }
    }
  ]
  , "pipeline": {
    "processors": [
      {
        "script": {
          "source": """
            if (ctx.containsKey('old_field') && ctx.old_field instanceof String) {
              def new_field_start = 12 + ctx.old_field.substring(11).indexOf('.');
              ctx.new_field = ctx.old_field.substring(new_field_start);
            }
          """
        }
      }
    ]
  }
}

It returns:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "new_field" : "my_app",
          "old_field" : "prometheus.whatever.my_app"
        },
        "_ingest" : {
          "timestamp" : "2020-01-24T16:36:31.987Z"
        }
      }
    }
  ]
}

It is not very nice to see but it should fit your case. I don't remember how (and if you can) to find the second occurrence of a string in painless without using the regex (to use them you have to enable them in the elasticsearch.yml with the consequent elasticsearch warning about it).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.