Add operation on log messages

Hi Team,

I have logs, where when bank account is opened i am getting below messages.

for APAC -

bank account is opened in APAC

for EMEA -

bank account is opened in EMEA

Say for a day, if there was 4 accounts opened, I am able to run elk query and get the correct result, the elk result is correct as i can confirm that with application log.

The query is,

GET acct-*/_search
{
  "track_total_hits": true,
  "size": 0,
  "sort": [
    {
      "@timestamp": {
        "order": "desc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "version": true,
  "script_fields": {},
  "stored_fields": [
    "*"
  ],
  "runtime_mappings": {},
  "_source": false,
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "bool": {
            "filter": [
              {
                "bool": {
                  "should": [
                    {
                      "match_phrase": {
                        "log.file.path.keyword": "app.log"
                      }
                    }
                  ],
                  "minimum_should_match": 1
                }
              },
              {
                "bool": {
                  "should": [
                    {
                      "match_phrase": {
                        "message": "bank account is opened in APAC"
                      }
                    }
                  ],
                  "minimum_should_match": 1
                }
              }
            ]
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-24h",
              "lte": "now",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

It gives results like,

{
  "took" : 459,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

I need to calculate total accounts created for a day (APAC + EMEA),

When I am adding second match_phrase for EMEA, its giving below error,

 "match_phrase": {
  "message": "bank account is opened in EMEA"
  }

Error -

{
        "type" : "x_content_parse_exception",
        "reason" : "[44:23] [bool] failed to parse field [should]"
 }

"caused_by" : {
          "type" : "json_parse_exception",
          "reason" : "Duplicate field 'match_phrase'\n at [Source: (org.elasticsearch.common.io.stream.ByteBufferStreamInput); line: 45, column: 37]"
        }

How can I

  1. add second match_phrase in above same query
  2. and then also do addition (+ operation) to get the total bank accounts in the above query?

Thanks,

Hi Team,

Can someone please reply.

Thanks,

Please be patient in waiting for responses to your question and refrain from pinging multiple times asking for a response or opening multiple topics for the same question. This is a community forum, it may take time for someone to reply to your question. For more information please refer to the Community Code of Conduct specifically the section "Be patient". Also, please refrain from pinging folks directly, this is a forum and anyone that participates might be able to assist you.

If you are in need of a service with an SLA that covers response times for questions then you may want to consider talking to us about a subscription.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Hi @dadoonet,

Thank you for your reply.

Apologies for being impatient. I will take care from next time.

I am not seeing anything about recreation script in the given link. I will try to explain below again.

I am getting below message in application logs, when bank account is opened for APAC,

2021-08-12 04:19:29.511 | INFO  | 2f38a1d4-4799-819b42035e54 | 3dfeee31cba64f054 | bank account is opened in APAC, Id=4799-8b4f-819b42035e54, Status=received
2021-08-12 04:51:12.425 | INFO  | 4e9d2512-4903-c53b679da637 | 9af02a0c46885c54b | bank account is opened in APAC, Id=4903-baf4-c53b679da637, Status=received
2021-08-12 09:02:17.734 | INFO  | b1bf46f9-46e6-310ed88f16cc | 92854445c39929995 | bank account is opened in APAC, Id=46e6-9c47-310ed88f16cc, Status=received
2021-08-12 09:05:16.544 | INFO  | 16cdac29-4dd7-68cff2b3096e | 032250332b1b4cf6d | bank account is opened in APAC, Id=4dd7-ba10-68cff2b3096e, Status=received

When bank account is opened for EMEA, below are the logs,
.

2021-08-12 04:38:38.896 | INFO  | 4ceb34fc-80ca-1f5a83e16c9d | cd5d021239245eda9 | bank account is opened in EMEA, Id=5769-8c4f-819b34038954, Status=received
2021-08-12 06:03:06.093 | INFO  | b1bf46f9-9c47-310ed88f16cc | 243455c245e44c865 | bank account is opened in EMEA, Id=1279-7c44-332b42635454, Status=received 
2021-08-12 06:03:15.442 | INFO  | b1bf46f9-9c47-310ed88f16cc | 27d56f1e7f965ed1d | bank account is opened in EMEA, Id=89e6-9c47-031ed78f31bc, Status=received 
2021-08-12 06:10:24.534 | INFO  | 16cdac29-7dd1-68cff2b3096e | e039423347310327f | bank account is opened in EMEA, Id=54t6-6b48-310ed88f89dc, Status=received
2021-08-12 06:10:53.832 | INFO  | 16cdac29-7dd1-68cff2b3096e | e1cb0f12172551a74 | bank account is opened in EMEA, Id=88j7-7c47-4560ed88f78w, Status=received 
2021-08-12 06:11:35.147 | INFO  | 16cdac29-7dd1-68cff2b3096e | 4583b2681623658fe | bank account is opened in EMEA, Id=15e1-3b63-310ed88f56dd, Status=received 
2021-08-12 06:12:04.799 | INFO  | b1bf46f9-0aff-310ed88f16cc | 2245f59da2d4ffa89 | bank account is opened in EMEA, Id=3476-89v9-1280ed88f23r, Status=received 
2021-08-12 06:12:50.293 | INFO  | 16cdac29-ba10-68cff2b3096e | bfcad4279dbda52b5 | bank account is opened in EMEA, Id=8897-9c47-320ed88f46cy, Status=received 

I need to calculate how many times account is opened for APAC and for EMEA, so i am running above query which is working correctly.

It is giving correct results,

Results are correct as they are matching with server output below,

[root@server ~]# cat app.log  | grep '2021-08-12' | egrep -i 'bank account is opened in EMEA'  |wc -l
8
[root@server ~]# 
[root@server ~]# cat app.log  | grep '2021-08-12' | egrep -i 'bank account is opened in APAC'  |wc -l
4
[root@server ~]#

Currently i need to run the same query individually two times (by changing the match_phrase) to get the output for APAC and EMEA.

So can i mention two match_phrase in the same query? If i try to add the second match_phrase just below the first one in above query, like below,

{
                "bool": {
                  "should": [
                    {
                      "match_phrase": {
                        "message": "bank account is opened in EMEA"
                      },
                      "match_phrase": {
                         "message": "bank account is opened in APAC"
                      }
                    }
                  ],

It is giving error.

1 The intention is to use the same query to show the no. of accounts for both regions, (currently its only showing only one results as only one match_phrase is used), something like below,

"hits" : {
    "total" : {
      "value" : 4,

.
.

"hits" : {
    "total" : {
      "value" : 8,

2 Then, in the same query, can I add both the values to show the total no. of accounts of both the regions?

.
.

"hits" : {
    "total" : {
      "value" : 12,

Thanks,

I'd need a way to reproduce your problem.

As I wrote:

So please take some time to write such a script.
This is worth it.

You might fix the problem by yourself by doing so and if not, we can then definitely work with you from that base to fix it.

Hi @dadoonet,

Thanks for your reply.

Sorry, I am afraid as i am not aware of any reproduction script in elasticsearch we can create from kibana and also not seeing anything related to it from the link you have shared.

If my elasticsearch understanding is correct, we can achieve this via above GET call and to reproduce this case at your end, you can indexed above logs and then run above GET request call to fetch the data, same way i did.

As you can see my current request is working (it is giving results with single match_phrase) but it needs some modification in order to to include another match_phrase in the same request and then doing the addition of both results if possible within same request.

Sorry that i am less helpful here.

Thanks,

Yes; That's basically what is needed.
I need a script which I can just run to generate the data in a clean index and then run the search query.

Hi @dadoonet,

Are you talking about bulk indexing like below,

below are the sample lines,. Can you please copy them inside file and below command.

{ "index" : { "_index" : "sample", "_type" : "log"} }
{ "msg": "2021-08-12 04:19:29.511 | INFO  | 2f38a1d4-4799-819b42035e54 | 3dfeee31cba64f054 | bank account is opened in APAC, Id=4799-8b4f-819b42035e54, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 06:10:24.534 | INFO  | 16cdac29-7dd1-68cff2b3096e | e039423347310327f | bank account is opened in EMEA, Id=54t6-6b48-310ed88f89dc, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 08:15:31.324 | INFO  | 9de13dc6-b88b-5a81-976b-d56183b38acd | 6736088c74a921cf688cafdb4d5174a9 | 6ff87c15d232c321  | NextAccountCapture - Fetch payment request details...http://example:8080/next-acc/v2/general/payments/9de13dc6-b88b-5a81-976b-d56183b38acd" }
{ "index" : { "_index" : "sample", "_type" : "log"} 
{ "msg": "2021-08-12 04:19:29.511 | INFO  | 34f8a1d4-4799-819b42035e54 | acbeee31cba64f054 | bank account is opened in APAC, Id=6366-bb3d-67bb42024364, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 04:19:29.511 | INFO  | 2458a1d4-5599-329b42035e67 | 5dfeee31cba64f078 | bank account is opened in APAC, Id=6799-hb4f-349b42035e54, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 04:19:29.511 | INFO  | 1f38a1d4-6599-459b42035e54 | 56feee31cba64f078 | bank account is opened in APAC, Id=8899-9c4f-109b42035e23, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 06:10:24.534 | INFO  | 11cdac29-7dd1-68cff2b30923 | a039423347310321f | bank account is opened in EMEA, Id=54t6-6b48-110ed88f81dc, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 06:10:24.534 | INFO  | 12cdac29-7dd1-68cff2b30956 | b039423347310322f | bank account is opened in EMEA, Id=54t6-6b48-110ed88f82da, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 08:13:01.236 | INFO  | bdo1346e-4364-363egc5or4ew | 923acb52dc22c079352d6053rm4cm32d  | PaymentService - Returning payments" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 06:10:24.534 | INFO  | 13cdac29-7dd1-68cff2b30943 | c039423347310323f | bank account is opened in EMEA, Id=54t6-6b48-120ed88f83dw, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 06:10:24.534 | INFO  | 14cdac29-7dd1-68cff2b30678 | d039423347310324f | bank account is opened in EMEA, Id=54t6-6b48-130ed88f84cc, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 08:13:01.235 | INFO  | ad89e883-1244-4354ab7731cr | 7d605cb52dc2c4b7923ac2dc2c079334  | PaymentService - Adding payments" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 06:10:24.534 | INFO  | 15cdac29-7dd1-68cff2309636 | e039423347310325f | bank account is opened in EMEA, Id=54t6-6b48-140ed88f85vc, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 06:10:24.534 | INFO  | 16cdac29-7dd1-68cff2b30923 | f039423347310326f | bank account is opened in EMEA, Id=54t6-6b48-150ed88f86hg, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 06:10:24.534 | INFO  | 17cdac29-7dd1-68cff2b30998 | g039423347310327f | bank account is opened in EMEA, Id=54t6-6b48-160ed88f83vb, Status=received" }
{ "index" : { "_index" : "sample", "_type" : "log"}
{ "msg": "2021-08-12 08:16:01.511 | INFO  | b600dcb2-a433-4b50-98fc-f31bd51ee31a | a8456c283f335c15f6929588da765180 | b8be57f13c9687cd  | PaymentService - Update Acct resource. paymentService=payments, paymentProduct=transfers, paymentId=b600dcb2-a433-4b50-98fc-f31bd51ee31a" }

to index above file used below,

 curl -u elastic:xxx  -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary @sample_log_bulk.json

Thanks

below request giving the output,

GET sample/_search
{
  "aggs": {},
  "size": 0,
  "fields": [],
  "script_fields": {},
  "stored_fields": [
    "*"
  ],
  "runtime_mappings": {},
  "_source": {
    "excludes": []
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "msg": "bank account is opened in APAC"
                }
              }
            ],
            "minimum_should_match": 1
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

Output -

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

Why are you keeping the msg field as is?

I mean that instead of having a document like:

POST /sample/_doc
{  
   "msg": "2021-08-12 06:10:24.534 | INFO  | 16cdac29-7dd1-68cff2b3096e | e039423347310327f | bank account is opened in EMEA, Id=54t6-6b48-310ed88f89dc, Status=received" 
}

You should index:

PUT /sample/_doc/54t6-6b48-310ed88f89dc
{
  "@timestamp": "2021-08-12 06:10:24.534",
  "level": "INFO",
  "id1": "16cdac29-7dd1-68cff2b3096e",
  "id1": "e039423347310327f",
  "text": "bank account is opened in EMEA",
  "status": "received"
}

Then, that would be much easier to deal with.
You can do that using ingest pipelines and a dissect processor.

I created here a sample script from which we can start:

DELETE /sample

POST /sample/_bulk
{ "index" : { } }
{ "msg": "bank account is opened in APAC" }
{ "index" : { } }
{ "msg": "bank account is opened in APAC" }
{ "index" : { } }
{ "msg": "bank account is opened in APAC" }
{ "index" : { } }
{ "msg": "bank account is opened in APAC" }
{ "index" : { } }
{ "msg": "bank account is opened in EMEA" }
{ "index" : { } }
{ "msg": "bank account is opened in EMEA" }
{ "index" : { } }
{ "msg": "bank account is opened in EMEA" }
{ "index" : { } }
{ "msg": "bank account is opened in EMEA" }
{ "index" : { } }
{ "msg": "bank account is opened in EMEA" }
{ "index" : { } }
{ "msg": "PaymentService - Adding payments" }
{ "index" : { } }
{ "msg": "PaymentService - Update Acct resource. paymentService=payments" }

GET sample/_search
{
  "query": {
    "match_phrase": {
      "msg": "bank account is opened in APAC"
    }
  }
}

GET sample/_search
{
  "query": {
    "match_phrase": {
      "msg": "bank account is opened in EMEA"
    }
  }
}

You can see that I removed a lot of non needed stuff so we can focus on the problem.
If you run this in Kibana, you will see that it gives the expected results.

So I'm not sure about your question.
From that script, could you clarify please?

HI @dadoonet,

Thanks for trying.

Yes individual GET calls works,

1

GET sample/_search
{
  "size": 0,
  "query": {
    "match_phrase": {
      "msg": "bank account is opened in APAC"
    }
  }
}

Output -

"hits" : {
    "total" : {
      "value" : 4,

2

GET sample/_search
{
  "size": 0,
  "query": {
    "match_phrase": {
      "msg": "bank account is opened in EMEA"
    }
  }
}

Output -

"hits" : {
    "total" : {
      "value" : 5,

and if we combine both the match_phrase in a single GET call (which was in-fact my 1st question), Its giving correct result.

GET sample_search
{
  "size": 0,
  "query": {
    "bool": {
    "should": [
        {
          "match_phrase": {
            "msg": "bank account is opened in APAC"
          }
        },
        {
          "match_phrase": {
            "msg": "bank account is opened in EMEA"
          }
        }
      ]
    }
  }
}

Output -

"hits" : {
    "total" : {
      "value" : 9,

Is there a way to perform multiplication or subtraction operation (* or -) with output of these two values (4 and 5) in a single GET call?

Thanks,

You can run a multi search call. It's like running 2 searches. See Multi search API | Elasticsearch Guide [7.15] | Elastic

Or you can run a terms aggregation on the field msg.keyword. See Terms aggregation | Elasticsearch Guide [7.15] | Elastic

Hi @dadoonet,

Thank you for your reply. I have went through links but i don't think what i am looking for is achievable. through it.

May be i was not able to convey what i am trying to say, I will explain it again,

As you know both the GET request giving their output i.e giving matching doc counts
( APAC GET request giving - 4 and EMEA GET request giving - 5),

Can we create single GET request in such a way , which will fetch matching doc counts for both match_phrase (i.e EMEA and APAC), in a single request and then perform mathematical operation on that counts ?

Example,

below giving count as 4

GET sample/_search
{
  "size": 0,
  "query": {
    "match_phrase": {
      "msg": "bank account is opened in APAC"
    }
  }
}

Output -

"hits" : {
    "total" : {
      "value" : 4,

below giving count as 5

GET sample/_search
{
  "size": 0,
  "query": {
    "match_phrase": {
      "msg": "bank account is opened in EMEA"
    }
  }
}

Output

"hits" : {
    "total" : {
      "value" : 5,

So can we combine both the match_phrase in a single request (this we are able to do) but also perform multiplication or subtraction operation ( * or - ) on these values (i.e 4 and 5), in the same request which will give output as 20 (if we do multiplication).

Is this achievable?

I basically want to do mathematical operation on the match doc counts in the same request.

Thanks,

I don't understand the use case.

But for a sum, you can put both queries with a bool query inside the should array.

@dadoonet, Thanks

Ok. Last try.

We know below GET call gives value as 9 because APAC match_phrase gives value as 4 and EMEA match_phrase gives value as 5 so it is doing addition of both the results by default and giving value as 9, i.e (4+5 =9),
is there way in the below GET call, can i also do multiplication of both of these values (4*5) so the result will be 20. (I know multiplication logic is not making sense here but trying to understand if below GET call can be modified in such a way that it is also doing multiplication (or any math operation like -, /) on two values it is getting )

GET sample_search
{
  "size": 0,
  "query": {
    "bool": {
    "should": [
        {
          "match_phrase": {
            "msg": "bank account is opened in APAC"
          }
        },
        {
          "match_phrase": {
            "msg": "bank account is opened in EMEA"
          }
        }
      ]
    }
  }
}

Thanks,

No. You need to do that in your application layer.

ok Thanks

Hi @dadoonet, Thanks

If this is only possible from application layer, I will look into that.

As you see response for below query, is giving value as total, i.e sum of both match_phrase which is 9, Can I at least fetch out individuals results of the both match_phrase from the request?

Can we get response something like below?

.
.

"hits" : {
    "total" : {
      "value" : 4,
.
.
"hits" : {
    "total" : {
      "value" : 5,

There are many match_phrase to run and hence i do not want to create individual curl request for each and trying to include maximum in single curl request.

Thanks,

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.