Count occurrences of a variable term

In my logs I have a field "message" that has in it among other strings one term "UserID:" followed by a value.

This value is mostly different in each document, but sometimes the same UserID is logged.

I am trying to find a way to count the number of times each UserID is logged.

I have been researching but can't find a way to isolate the value after "UserID:" and then count based on this value.

Any help would be greatly appreciated.

Hi @MobileOne .

Could you give an example of the indexed doc?

Hi @RabBit_BR

You will see below that the "message" field has "response={ UserId:" followed by the ID that I want to count across all the results.

{
        "_index" : ".sample-2022.05.10-001311",
        "_type" : "_doc",
        "_id" : "ECaIr4ABpdjiMbYDoZXE",
        "_score" : 49.102036,
        "_ignored" : [
          "message.keyword"
        ],
        "_source" : {
          "container" : {
            "image" : {
              "name" : "xyz.amazonaws.com/user:3.471ae2"
            }
          },
          "cluster" : "qa",
          "kubernetes" : {
            "container" : {
              "name" : "user"
            },
            "pod" : {
              "ip" : "70.71.245.322",
              "name" : "user-6768fdff5-v33k"
            },
            "namespace" : "qa",
            "replicaset" : {
              "name" : "user-6734573255"
            },
            "labels" : {
              "service" : "user",
              "pod-template-hash" : "2554339581",
              "track" : "stable"
            }
          },
          "level" : "INFO",
          "project" : "test",
          "message" : "[2022-05-10 19:53:27.228] [INFO] response={ UserId:  '977dfe3fd034c8609b6ec63cafd0d14f57caa24536593c7f2f3df6f2a6b4c1236e1147fb44a1eb6', registrationId: '9D702A7B-B676-E74E0799', rawRequest: undefined, rawResponse: undefined, token: 'f80b7a786cc49a29d03ed9a1c06954dd521c70fef445aada4eb8511eb12170f677b7ba68bbe56f595a575f1', timestamp: '1652212407227' } responseTime=987 service=user traceId=7dd7a96b8ce051f1",
          "market" : "us",
          "input" : { },
          "environment" : "staging",
          "@timestamp" : "2022-05-10T19:53:27.228Z",
          "ecs" : { },
          "stream" : "stdout",
          "service" : "user",
          "host" : {
            "name" : "ip-70-71-245-322"
          },
          "region" : "us",
          "event" : "authorize"
        }
  }

I don't see a way to do this.
Maybe another user can help you.
Excuse me.

Thanks @RabBit_BR, I think that the direction is via the Analizer, but lack of experience with this feature is an issu

Hi,

If it raise no performance problem, one possible way is using runtime field for UserId. With runtime field, you can use script to almost freely process the message field to extract the UserId. As it processes the entire documents for each query however, it may cause fatal performance problems.

In my opinion, it should be addressed before indexing. Using logstash for example, you can parse the message field and create UserId field suitable for Elasticsearch.

1 Like

Ingest pipeline is another solution.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.