Elastic Search query building logic


(Waseem Anjum) #1

I am working on building a report based on indexed documents. Following is the scenario of the report:

  • Status field that contains two types of values (ERROR, OK)

  • Destination field containing all the destination phone numbers (long data type)

Each destination field has a status either it is Error or OK. In some cases destination numbers have errors but same number also have OK status with different timestamp.

One report is of all the numbers delivered (status OK): done

Second report is all the number that were not delivered i.e number must not have a OK status but ERROR only.

P.S I am newbie in ES.


SQL Query with "NOT IN" keyword replacement in ES
(Jaspreet Singh) #2

Since same phone numbers can be OK or Error based on different time stamps, what are you returning the two cases?
By the way, have you looked at nested queries - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html


(Waseem Anjum) #3

Since same phone numbers can be OK or Error based on different time stamps, what are you returning the two cases?

When message is not delivered system retries it but also logs a error message. When it is delivered then OK status with same number is updated as new log entry.

I have looked at nested queries but I am not sure how it solves my issue.


(Jaspreet Singh) #4

So initially this is what I had in mind ...

[
    {
        "phone numer":"123-234-567",
        "status":
        {

            "status":"OK",
            "timestamp":"12232001"
        }
        {
            "status":"Error",
            "timestamp":"12122007"
        }
    },
    {
        "phone numer":"123-234-567",
        "status":
        {

            "status":"OK",
            "timestamp":"12232001"
        }
        {
            "status":"Error",
            "timestamp":"12122007"
        }
    }
]

As you can see, since both error and OK can be a part of the same document under a particular phone, you cant really retrieve what you need (atleast at my first glance).
This sounds more useful ...

{
    "retry-log": [
        {

            "status":"OK",
            "timestamp":"12232001",
            "phone numer":"123-234-567",
        }
        {
            "status":"Error",
            "timestamp":"12122007",
            "phone numer":"123-234-567",
        }
        {

            "status":"OK",
            "timestamp":"12232001",
            "phone numer":"123-234-567"
        }
        {
            "status":"Error",
            "timestamp":"12122007",
            "phone numer":"123-234-567"
        }
    ]
}

Now you can simply query on status field for OK or Error values.


(Waseem Anjum) #5

Here is Sample Doc for OK/ERROR response

{
"_index": "http",
"_type": "http-messages",
"_id": "RW7nC2cBpIbUGlTvTmzj",
"_version": 1,
"_score": 1,
"_source": {
"msg_date": "2018-11-08",
"content": "Test Content"
"destination": 123456789,
"host": "ubuntu1",
"level": "INFO",
"msg_type": "NA",
"subdate": "2018-11-08 11:27:27.303469",
"retries": 0,
"status": "OK",
"@timestamp": "2018-11-13T07:09:12.461Z",
"msg_time": "11:27:27",
"from": "Sendertest",
"@version": "1",
"server_response": "Success",
"queue": "SendQ",
},
"fields": {
"msg_date": [
"2018-11-08T00:00:00.000Z"
],
"@timestamp": [
"2018-11-13T07:09:12.461Z"
]
}
}

ERROR

{
"_index": "http",
"_type": "http-messages",
"_id": "DG7nC2cBpIbUGlTvMGt2",
"_version": 1,
"_source": {
"msg_date": "2018-11-08",
"content": "Test Content"
"destination": 123456789,
"host": "ubuntu1",
"level": "ERROR",
"coding": 0,
"subdate": "2018-11-08 06:01:32.087154",
"retries": 3,
"status": "ERROR",
"@timestamp": "2018-11-13T07:09:12.141Z",
"msg_time": "06:02:17",
"from": "Sendertest",
"@version": "1",
"server_response": "HTTPConnectionPool: Read timed out. (read timeout=30.0)",,
"queue": "sendQ",
"type": "HTTP-MT"
},
"fields": {
"msg_date": [
"2018-11-08T00:00:00.000Z"
],
"@timestamp": [
"2018-11-13T07:09:12.141Z"
]
},
}


(Jaspreet Singh) #6

Did you try any query?
One report is of all the numbers delivered (status OK): done : Is this working? Can you not do something similar for Error as well?


(Waseem Anjum) #7

@Jaspreet_Singh I can get all numbers that have ERROR status. But the problem is I want only those numbers that doesn't have OK status at all. For example I have a number 123-123-123 and It only has ERROR status but when I query then 123-456-789 (number from above example) is also returned because that also has ERROR status.


(Jaspreet Singh) #8

But in your above example, 123456789 does not have OK status.
By the way you can get records that have ONLY ERROR and not OK but combining must and must_not.
Try something like ... (im going to assume a few things but logic should be similar)

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "status": "ERROR"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "status": "OK"
          }
        }
      ]
    }
  }
}

So this will return all records that have status ERROR and do not have status OK.


(Waseem Anjum) #9

@Jaspreet_Singh I posted two docs in which one is OK and one is Error for same destination. Method that you have mentioned I have already tried it in many different ways but It also returns 123456789 as error. :frowning: :sweat:


(Waseem Anjum) #10

To make thing more clear I am sharing the SQL query below that is giving the correct result:

Select * from TblName where status='ERROR' not in (select * from TbleName where status='OK' )


(Christian Dahlqvist) #11

One way to solve this would be through an entity-centric index. You basically create a separate index, where each phone number is represented by a single document with the phone number as a document ID. When data comes in you insert the raw document as usual into the current index, but at the same time also add any relevant information to the appropriate document in the new entity-centric index. If you have structured this correctly, it should be very easy to create your report based on this index.


(Waseem Anjum) #12

Seems like a good idea. I will try it and post my findings here. Thanks


(Waseem Anjum) #13

Thanks Everyone for help. I have solved the problem by using multiple aggregations. Combined two fields and applied unique and min aggs and got the results. Visualization shows correct results that I can further filter out to get desired output.