How do I query for messages between two timestamps using the elasticsearch-dsl library?


(Michel Albert) #1

I'm trying to run a simple query for messages that have a specific key using Python, but also limit the results between two timestamps. I managed to query for the existing key using the following:

import sys
from getpass import getpass

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Q, Search

client = Elasticsearch(
    'https://elapi-rc.ipsw.dt.ept.lu',
    http_auth=('malbert', getpass()))

field_query = Q('query_string', query='_exists_:ept.runtime_seconds')

s = Search(using=client, index="oss-*")
s = s.query(field_query)
s = s[0:10]

for hit in result:
    print(hit)

But now I can't seem to find any solution to also query for the @timestamp field :cry:

Here is one of my attempts:

[...]

time_query = Q('range', timestamp={
    'gte': '2018-01-01 00:00:00',
    'lt': 'now'
})
field_query = Q('query_string', query='_exists_:ept.runtime_seconds')

s = Search(using=client, index="oss-*")
s = s.query(field_query)
s = s.query(time_query)

[...]

But that does return an empty set. So I guess the timestamp filter is not quite right. I am sure that I have entries in that time-range!

s = s.filter('range', timestamp={'gte': '2018-01-01 00:00:00', 'lt': 'now'})

without any luck (also returns an empty set).

I am a bit confused at other posts. I've seen a post mentioning that @timestamp would be a valid identifier, but that's not the case. Code like the following is a syntax-error in Python:

time_query = Q('range', @timestamp={  # <-- syntax error
    'gte': '2018-01-01 00:00:00',
    'lt': 'now'
})

I can see in kibana that the timestamp field is indeed named @timestamp so I am not sure if my query above simply uses the wrong field.

So how do I specify that it should look into @timestamp?


(Val Crettaz) #2

You have two options. The first one involves using query_string exclusively, like this:

field_query = Q('query_string', query='_exists_:ept.runtime_seconds AND @timestamp:[2018-01-01 00:00:00.000Z TO now]')

You almost had the correct solution for the second option, yet you need to combine both constraints using a bool query:

 time_query = Q('range', timestamp={
    'gte': '2018-01-01 00:00:00',
    'lt': 'now'
})
field_query = Q('query_string', query='_exists_:ept.runtime_seconds')

s = Search(using=client, index="oss-*")
s = s.query = Q('bool', filter=[field_query, time_query])

(Michel Albert) #3

There still seems to be an issue with the timestamp query. This works:

field_query = Q('query_string', query='_exists_:ept.runtime_seconds')
s = Search(using=client, index="oss-*")
s = s.query(field_query)
s = s[0:10]
result = s.execute()
for hit in result:
    print(hit['@timestamp'])

Which returns the following:

2018-05-21T04:06:15.759Z
2018-05-21T04:06:18.278Z
2018-05-21T04:06:19.717Z
2018-05-21T04:06:22.901Z
2018-05-21T04:06:23.311Z
2018-05-21T04:06:29.660Z
2018-05-21T04:06:32.628Z
2018-05-21T04:06:37.144Z
2018-05-21T04:06:37.333Z
2018-05-21T04:06:38.362Z

So there are clearly entries in 2018. As soon as I add the time range, the result is empty:

field_query = Q('query_string', query='_exists_:ept.runtime_seconds')
time_query = Q('range', timestamp={
    'gte': '2018-01-01 00:00:00',
    'lt': 'now'
})
combined = Q('bool', filter=[field_query, time_query])

s = Search(using=client, index="oss-*")
s = s.query(combined )
s = s[0:10]
result = s.execute()
print(result.hits.total)

Output:

0

I still have the feeling that using Q('range', timestamp=...) does not really target the @timestamp field. Although I might be totally mistaken. But I tried replacing it with a random keyword arg like Q('range', foobar=...) and no exception was raised. So I assume that the query will naïvely look for the field foobar and because that field does not exist anywhere it executes successfully, but without result. Could it be that Q('range', timestamp=...) looks for the non-existing field timestamp? Instead of @timestamp?

If that's the case, how do I create a query that uses @timestamp? Given that Q('range', @timestamp=...) is an invalid syntax in Python?


(Michel Albert) #4

I got it working today using only the query_string query. However, the timestamp filter given by @val (@timestamp:[2018-01-01 00:00:00.000Z TO now]) did not work.

I had to use @timestamp:[2018-01-01T00:00:00.000Z TO now] instead (using T to separate date from time).


(Val Crettaz) #5

Good point, I missed the T after copy/pasting


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.