Sliced search not returning all hits

Hello,

I am a new Elastic user, and I already ran into an issue. I am trying to extract all logs from a certain index. In order to deal with large indices I want to implement pagination using search_after and PiTs (Points-in-Time).

Python lib version: 7.13.3
ELK version: 7.17.1

My code looks like this:

  pit = client.open_point_in_time(index=log_index, keep_alive='1m')
  pit_id = pit['id']

  log_file_items = list()

  lines_skipped = 0

  last_hit_sort = None 

  for slice_id in range(2):
      query_body = {
          'size': 150,
          "track_total_hits": False,  
          'pit': {
              'id': pit_id,
              'keep_alive': '10s'
          },
          'slice': {
              "id": slice_id,
              "max": 2
          },
          "query": {
              "match": {
                  "log.message": "[specific-token]"
              },
          },
          "sort": [
              {"@timestamp": "asc"},
              {"_shard_doc": "desc"}
          ],
      }

      if last_hit_sort is not None:
          query_body['search_after'] = last_hit_sort

      result = client.search(body=query_body)

      print('total hits', slice_id, len(result['hits']['hits']))

      for row in result['hits']['hits']:
          [parsing data...]

      if len(result['hits']['hits']) > 0:
          last_hit_sort = result['hits']['hits'][-1]['sort']
      else:
          break

  # --- Cleanup pit
  pit_close = client.close_point_in_time(body={
      'id': pit_id
  })

Because I don't have a sample index yet with more than 10,000 hits I wanted to test it by limiting the query size to 150 hits. It should return 261 hits in total, but the first slice only gives me 139 hits. The second slice returns 0 hits.

The curious thing is that when I change the sorting of the timestamp from ascending to descending then it returns more results, but still not the full amount.

"sort": [
{"@timestamp": "asc"},
{"_shard_doc": "desc"}
],

changed to

"sort": [
{"@timestamp": "desc"},
{"_shard_doc": "desc"}
],

Returns 139 in the first slice, and 114 in the second slice, and in total 253 which means that 8 hits are still missing. I am not sure what I am doing wrong, but any help would be appreciated. I went over the docs multiple times and searched for this issue, but I simply cannot find what I am doing wrong.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.