Hello,
I am a new Elastic user, and I already ran into an issue. I am trying to extract all logs from a certain index. In order to deal with large indices I want to implement pagination using search_after and PiTs (Points-in-Time).
Python lib version: 7.13.3
ELK version: 7.17.1
My code looks like this:
pit = client.open_point_in_time(index=log_index, keep_alive='1m')
pit_id = pit['id']
log_file_items = list()
lines_skipped = 0
last_hit_sort = None
for slice_id in range(2):
query_body = {
'size': 150,
"track_total_hits": False,
'pit': {
'id': pit_id,
'keep_alive': '10s'
},
'slice': {
"id": slice_id,
"max": 2
},
"query": {
"match": {
"log.message": "[specific-token]"
},
},
"sort": [
{"@timestamp": "asc"},
{"_shard_doc": "desc"}
],
}
if last_hit_sort is not None:
query_body['search_after'] = last_hit_sort
result = client.search(body=query_body)
print('total hits', slice_id, len(result['hits']['hits']))
for row in result['hits']['hits']:
[parsing data...]
if len(result['hits']['hits']) > 0:
last_hit_sort = result['hits']['hits'][-1]['sort']
else:
break
# --- Cleanup pit
pit_close = client.close_point_in_time(body={
'id': pit_id
})
Because I don't have a sample index yet with more than 10,000
hits I wanted to test it by limiting the query size to 150
hits. It should return 261
hits in total, but the first slice only gives me 139
hits. The second slice returns 0
hits.
The curious thing is that when I change the sorting of the timestamp from ascending to descending then it returns more results, but still not the full amount.
"sort": [
{"@timestamp": "asc"},
{"_shard_doc": "desc"}
],
changed to
"sort": [
{"@timestamp": "desc"},
{"_shard_doc": "desc"}
],
Returns 139
in the first slice, and 114
in the second slice, and in total 253
which means that 8
hits are still missing. I am not sure what I am doing wrong, but any help would be appreciated. I went over the docs multiple times and searched for this issue, but I simply cannot find what I am doing wrong.