Elasticsearch query problem

Hi guys!

I've got a problem with definition of query in elasticsearch.

I've got two types of messages:

  1. Jul 23 09:24:16 mmr mmr-core[5147]: Aweg3AOMTs_1563866656876839.mt processMTMessage() #12798 realtime: 5.684 ms

  2. Jul 23 09:24:18 mmr mmr-core[5210]: Aweg3AOMTs_1563866656876839.0.dn processDN() #7750 realtime: 1.382 ms

First message is kind of sent message and second is message which confirm that message was delivered.

The difference between them is the suffix which I have separated from "id" and can query it.

I would like to find out which messages were succesfully delivered and which weren't. I am very begginer in elasticsearch so I'm really struggling.

My idea is like match id (number) and matched number suffix. So when id number 1563866656876839 has suffix dn it was delivered. We do not know the id number before we get the log message because it's generated automatically.

In MariaDB SQL I managed it with following query:

SELECT num1
from table4
WHERE suffix IN ('mt', 'dn')
GROUP BY num1
HAVING COUNT(DISTINCT suffix) = 2;

Does anyone have any ideas how to solve it? Any help is appreciated!! :pray:

Hi, @Vladpov,

One idea to solve it depend on how you store your data do you parse them or you save the full line in a string??

POST your_index/your_doc/your_id
{
   "message": " Jul 23 09:24:16 mmr mmr-core[5147]: **Aweg3AOMTs_1563866656876839.mt** processMTMessage() #12798 realtime: 5.684 ms"
}

OR

POST your_index/your_doc/your_id
{
    "some_date": "Jul 23 09:24:16",
   "other_fields": "mmr",
   ....
   "id_message": "Aweg3AOMTs_1563866656876839",
  "delivery_status": "mt",
 ....
}

If yo have the second one you can make a term aggregation on your id_message and the filter on count > 2

if you have the first one you can use ingest and _reindex to format your data to have the second format.

Juste one idea to solve.

1 Like

Thank you for your reply!

I parse the message so I have the parts of the id separately like:

text: Aweg3AOMTs
num1: 1563866656876839
suffix: mt/dn
messageId: Aweg3AOMTs_1563866656876839.mt

Would you tell me little bit more about the solution you suggested please?

I tried this so far:

GET /my_index3/_search
{
  "size": 0,
  "aggs": {
"num1": {
  "terms": {
    "field": "messageId.keyword",
    "include": ".*mt.*"
  }
}
  }
}

But it doesn't do what I need. How can I manage next step with finding appropriate dn and then I want to filter it so it should show me only dn's which arrived within 72 hours.

Thank you for your help!

Hi,

To filter you'd better to go with query bool must:

"query": {"bool":{"must":[
{"term": {"suffix.keyword": "dn"}},
{"range" : {
"@datetime" : {
"gte" : "now-3d/d"}}}
]}},
"aggs":{"num1":{"terms": {"filed":"num1", "size": 20}},
"size":0

You'll apply your query conditions and get the distinct num1 with the correct size as by default it return 10.

In relational database it would look like this:

   +-----------------+------+----------------+------------+------------------+------+--------+
    | date            | host | process        | text       | num1             | num2 | suffix |
    +-----------------+------+----------------+------------+------------------+------+--------+
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | dn     |
    +-----------------+------+----------------+------------+------------------+------+--------

So I have there a lot of kind of the same messages and I need always find id which has both dn and mt suffix ...

I tried your code without range clause but it didn't get the right solution.

I sent there the message with num1: 1563866656878888 and has suffix mt and dn

but query doesn't show it as you can see the aggregation shows it but it's only occurence.

Hi,

Sorry I'm lost what is your need ??

In a post you say that:

in the other post:

What I understand is for the first one you need only the dn in the last 72 hours
For the second one you need the id which has both dn and mt suffix???
For sure you cannot apply the request for the first case on the second case as the ask are different.

when you write this:

is it the result you expect that elastic will return or is it an example of document you have?

Can you please put one question at a time my brain is not enough elastic to compute several case in parallel. Or better put a simple use case with some document, it will help a lot.

Sorry that I couldn't explain my needs right way :sweat_smile: I'm quite desperate after time I've been trying to solve it :smiley: I try it this time better.

Well as I'm begginer with elasticsearch I decided to try make query for sql databases sample. The table from rational database I sent is kind of example how id's of log messages are kept.

The SQL query for getting the result I need is

SELECT num1
from table4
WHERE suffix IN ('mt', 'dn')
GROUP BY num1
HAVING COUNT(DISTINCT suffix) = 2;

It gets me all num1's which have 'mt' and 'dn'. This step I want to apply in elasticsearch.

Next thing I need is to find only id's their 'dn's (if they have any) that arrived within 72 hours. It could be done for example with this modification:

SELECT num1
FROM table4
WHERE suffix = 'mt' or (suffix = 'dn' and `date` > NOW() - INTERVAL 72 HOUR)
GROUP BY num1
HAVING COUNT(DISTINCT suffix) = 2;

So I need the same thing to do with elasticsearch and I don't know how :see_no_evil:

If there is anything unclear please ask :pray:

Thank you for your patience!!

This code helped me to find num1's that have two different suffix

GET /my_index3/_search 
{
"size": 0,
  "aggs": {
    "num1": {
      "terms": {
        "field": "num1.keyword",
        "order" : { "_count" : "desc" }
      },
      "aggs": {
        "count_of_distinct_suffix": {
          "cardinality" :{
             "field" : "suffix.keyword"
          }
        }
      }
    }
  } 
}
1 Like

Great,

So now only need to add the date filter and you are done.

GET /my_index3/_search 
{
"size": 0,
query": {"bool":{"must":[
{"range" : {
"@datetime" : {
"gte" : "now-3d/d"}}}
]}},
  "aggs": {
    "num1": {
      "terms": {
        "field": "num1.keyword",
        "order" : { "_count" : "desc" }
      },
      "aggs": {
        "count_of_distinct_suffix": {
          "cardinality" :{
             "field" : "suffix.keyword"
          }
        }
      }
    }
  } 
}

Heheh just copy paste!! :sweat_smile:

Thank you very much for your help again!

Now I'm here

   GET /my_index3/_search
    {
      "size": 0,
      "aggs": {
        "num1": {
          "terms": {
            "field": "num1.keyword",
            "order": {
              "_count": "desc"
            }
          },
          "aggs": {
            "count_of_distinct_suffix": {
              "cardinality": {
                "field": "suffix.keyword"
              }
            },
            "my_filter": {
              "bucket_selector": {
                "buckets_path": {
                  "count_of_distinct_suffix": "count_of_distinct_suffix"
                },
                "script": "params.count_of_distinct_suffix == 2"
              }
            }
          }
        }
      }
    }

With output

  "aggregations" : {
"num1" : {
  "doc_count_error_upper_bound" : 0,
  "sum_other_doc_count" : 0,
  "buckets" : [
    {
      "key" : "1563866656876839",
      "doc_count" : 106,
      "count_of_suffix" : {
        "value" : 2
      }
    },
    {
      "key" : "1563867854324841",
      "doc_count" : 50,
      "count_of_suffix" : {
        "value" : 2
      }
    },
    {
      "key" : "1563866656878888",
      "doc_count" : 42,
      "count_of_suffix" : {
        "value" : 2
      }
    },
    {
      "key" : "1563866656871111",
      "doc_count" : 40,
      "count_of_suffix" : {
        "value" : 2
      }

So it shows me numbers that have both suffix. I think that the range question won't do what I want it to do.

I mean that num1 has only 1 suffix and if the same num1 didn't get second suffix within some time e.g one hour it wouldn't show this bucket even if the count_of_distinct_suffix == 2.