Elasticsearch query problem

Vladpov · July 26, 2019, 8:24am

Hi guys!

I've got a problem with definition of query in elasticsearch.

I've got two types of messages:

Jul 23 09:24:16 mmr mmr-core[5147]: Aweg3AOMTs_1563866656876839.mt processMTMessage() #12798 realtime: 5.684 ms
Jul 23 09:24:18 mmr mmr-core[5210]: Aweg3AOMTs_1563866656876839.0.dn processDN() #7750 realtime: 1.382 ms

First message is kind of sent message and second is message which confirm that message was delivered.

The difference between them is the suffix which I have separated from "id" and can query it.

I would like to find out which messages were succesfully delivered and which weren't. I am very begginer in elasticsearch so I'm really struggling.

My idea is like match id (number) and matched number suffix. So when id number 1563866656876839 has suffix dn it was delivered. We do not know the id number before we get the log message because it's generated automatically.

In MariaDB SQL I managed it with following query:

SELECT num1
from table4
WHERE suffix IN ('mt', 'dn')
GROUP BY num1
HAVING COUNT(DISTINCT suffix) = 2;

Does anyone have any ideas how to solve it? Any help is appreciated!!

gabriel_tessier · July 26, 2019, 11:41am

Hi, @Vladpov,

One idea to solve it depend on how you store your data do you parse them or you save the full line in a string??

POST your_index/your_doc/your_id
{
   "message": " Jul 23 09:24:16 mmr mmr-core[5147]: **Aweg3AOMTs_1563866656876839.mt** processMTMessage() #12798 realtime: 5.684 ms"
}

OR

POST your_index/your_doc/your_id
{
    "some_date": "Jul 23 09:24:16",
   "other_fields": "mmr",
   ....
   "id_message": "Aweg3AOMTs_1563866656876839",
  "delivery_status": "mt",
 ....
}

If yo have the second one you can make a term aggregation on your id_message and the filter on count > 2

if you have the first one you can use ingest and _reindex to format your data to have the second format.

Juste one idea to solve.

Vladpov · July 29, 2019, 6:57am

Thank you for your reply!

I parse the message so I have the parts of the id separately like:

text: Aweg3AOMTs
num1: 1563866656876839
suffix: mt/dn
messageId: Aweg3AOMTs_1563866656876839.mt

Would you tell me little bit more about the solution you suggested please?

I tried this so far:

GET /my_index3/_search
{
  "size": 0,
  "aggs": {
"num1": {
  "terms": {
    "field": "messageId.keyword",
    "include": ".*mt.*"
  }
}
  }
}

But it doesn't do what I need. How can I manage next step with finding appropriate dn and then I want to filter it so it should show me only dn's which arrived within 72 hours.

Thank you for your help!

gabriel_tessier · July 30, 2019, 6:12am

Hi,

To filter you'd better to go with query bool must:

term "suffix.keyword": "dn"
range "@datetime": "gte": "now-72h" <--- need to check the correct syntax with now in the doc.
Term query | Elasticsearch Guide [8.11] | Elastic
Range query | Elasticsearch Guide [7.2] | Elastic
Range query | Elasticsearch Guide [7.2] | Elastic

"query": {"bool":{"must":[
{"term": {"suffix.keyword": "dn"}},
{"range" : {
"@datetime" : {
"gte" : "now-3d/d"}}}
]}},
"aggs":{"num1":{"terms": {"filed":"num1", "size": 20}},
"size":0

You'll apply your query conditions and get the distinct num1 with the correct size as by default it return 10.

Vladpov · July 30, 2019, 8:43am

In relational database it would look like this:

   +-----------------+------+----------------+------------+------------------+------+--------+
    | date            | host | process        | text       | num1             | num2 | suffix |
    +-----------------+------+----------------+------------+------------------+------+--------+
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656876839 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656873333 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656871111 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | mt     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | dn     |
    | Jul 23 09:24:16 | mmr  | mmr-core[1111] | Aweg3AOMTs | 1563866656872222 |    0 | dn     |
    +-----------------+------+----------------+------------+------------------+------+--------

So I have there a lot of kind of the same messages and I need always find id which has both dn and mt suffix ...

I tried your code without range clause but it didn't get the right solution.

I sent there the message with num1: 1563866656878888 and has suffix mt and dn

but query doesn't show it as you can see the aggregation shows it but it's only occurence.

gabriel_tessier · July 30, 2019, 9:37am

Hi,

Sorry I'm lost what is your need ??

In a post you say that:

in the other post:

What I understand is for the first one you need only the dn in the last 72 hours
For the second one you need the id which has both dn and mt suffix???
For sure you cannot apply the request for the first case on the second case as the ask are different.

when you write this:

is it the result you expect that elastic will return or is it an example of document you have?

Can you please put one question at a time my brain is not enough elastic to compute several case in parallel. Or better put a simple use case with some document, it will help a lot.

Vladpov · July 30, 2019, 9:57am

Sorry that I couldn't explain my needs right way I'm quite desperate after time I've been trying to solve it I try it this time better.

Well as I'm begginer with elasticsearch I decided to try make query for sql databases sample. The table from rational database I sent is kind of example how id's of log messages are kept.

The SQL query for getting the result I need is

SELECT num1
from table4
WHERE suffix IN ('mt', 'dn')
GROUP BY num1
HAVING COUNT(DISTINCT suffix) = 2;

It gets me all num1's which have 'mt' and 'dn'. This step I want to apply in elasticsearch.

Next thing I need is to find only id's their 'dn's (if they have any) that arrived within 72 hours. It could be done for example with this modification:

SELECT num1
FROM table4
WHERE suffix = 'mt' or (suffix = 'dn' and `date` > NOW() - INTERVAL 72 HOUR)
GROUP BY num1
HAVING COUNT(DISTINCT suffix) = 2;

So I need the same thing to do with elasticsearch and I don't know how

If there is anything unclear please ask

Thank you for your patience!!

Vladpov · July 30, 2019, 4:30pm

This code helped me to find num1's that have two different suffix

GET /my_index3/_search 
{
"size": 0,
  "aggs": {
    "num1": {
      "terms": {
        "field": "num1.keyword",
        "order" : { "_count" : "desc" }
      },
      "aggs": {
        "count_of_distinct_suffix": {
          "cardinality" :{
             "field" : "suffix.keyword"
          }
        }
      }
    }
  } 
}

gabriel_tessier · July 31, 2019, 11:03am

Great,

So now only need to add the date filter and you are done.

GET /my_index3/_search 
{
"size": 0,
query": {"bool":{"must":[
{"range" : {
"@datetime" : {
"gte" : "now-3d/d"}}}
]}},
  "aggs": {
    "num1": {
      "terms": {
        "field": "num1.keyword",
        "order" : { "_count" : "desc" }
      },
      "aggs": {
        "count_of_distinct_suffix": {
          "cardinality" :{
             "field" : "suffix.keyword"
          }
        }
      }
    }
  } 
}

Heheh just copy paste!!

Vladpov · July 31, 2019, 11:17am

Thank you very much for your help again!

Now I'm here

   GET /my_index3/_search
    {
      "size": 0,
      "aggs": {
        "num1": {
          "terms": {
            "field": "num1.keyword",
            "order": {
              "_count": "desc"
            }
          },
          "aggs": {
            "count_of_distinct_suffix": {
              "cardinality": {
                "field": "suffix.keyword"
              }
            },
            "my_filter": {
              "bucket_selector": {
                "buckets_path": {
                  "count_of_distinct_suffix": "count_of_distinct_suffix"
                },
                "script": "params.count_of_distinct_suffix == 2"
              }
            }
          }
        }
      }
    }

With output

  "aggregations" : {
"num1" : {
  "doc_count_error_upper_bound" : 0,
  "sum_other_doc_count" : 0,
  "buckets" : [
    {
      "key" : "1563866656876839",
      "doc_count" : 106,
      "count_of_suffix" : {
        "value" : 2
      }
    },
    {
      "key" : "1563867854324841",
      "doc_count" : 50,
      "count_of_suffix" : {
        "value" : 2
      }
    },
    {
      "key" : "1563866656878888",
      "doc_count" : 42,
      "count_of_suffix" : {
        "value" : 2
      }
    },
    {
      "key" : "1563866656871111",
      "doc_count" : 40,
      "count_of_suffix" : {
        "value" : 2
      }

So it shows me numbers that have both suffix. I think that the range question won't do what I want it to do.

I mean that num1 has only 1 suffix and if the same num1 didn't get second suffix within some time e.g one hour it wouldn't show this bucket even if the count_of_distinct_suffix == 2.

system · August 28, 2019, 11:17am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch terms aggregation and querrying Elasticsearch	2	471	August 26, 2019
DSL Query to search in message field for some value and it must not contain something else Elasticsearch	2	3219	October 16, 2021
Query doesn't show right result Elasticsearch	3	377	August 27, 2019
Watchers: querying for multiple strings in timeframe Elasticsearch elastic-stack-alerting	2	990	August 20, 2020
How to get the unique results from ES query Elasticsearch	1	484	February 9, 2017

Elasticsearch query problem

Related topics