Bulk Upload but search cant find all docs

I have done a bulk upload via the API in python as follows:

def to_bulk_doc(_doc):
        return {
            # default value: 'index'
            '_op_type': 'index',
            '_index': es_index,
            '_id': uuid.uuid4(),
            '_source': _doc,
            '_type': 'document'
        }
  for doc in json_docs_list:
            json_doc = json_docs_list[0]
            doc_resources: list = json.loads(doc)['resources']
            # split doc_resources into a list-of-lists, where each list has max=max_batch_size elements
            chunks = chunked(doc_resources, max_batch_size)
            for batch in chunks:
                # convert batch of json docs to a format compatible with bulk API using the 'to_bulk_doc' 
                function defined above
                actions = map(to_bulk_doc, batch)
                res = helpers.bulk(client=es, actions=actions)
                print(res)

This successfully uploads the documents to the index:
In Dev Tools: GET /_cat/indices/es_index?v=true

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open es_index Xngti... 5 1 304444 0 445.3mb 222.5mb

But searching in Kibana Dev Tools with:
GET es_index/_search
only returns 346 lines and nothing else, yet there is supposed to be 304K documents?

Any ideas what might be wrong?
Thanks!

Hi @GaryD ,

welcome to the Kibana community.

How many hits are shown in the hits.total.value? Still 346 or 304k?

Thanks Marco, have you got the Kibana Dev Tools Get for that?
GET hits.total.value returns 404

Hi @GaryD

I think that there's a misunderstanding.

I meant this field in the response:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
...
  },
  "hits" : {
    "total" : {
      "value" : 4675, <== this number here
      "relation" : "eq"
    },
...

Thanks Marco, I dont seem to have a value attribute:

{
  "took": 19,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 304444,
    "max_score": 1,
    "hits": [
      {
        "_index": "es_index",
        "_type": "document",
        "_id": "0cca6ec0-4772-4526-93bb-a437ced2698d",
        "_score": 1,
        "_source": {
          "metadata": {
            "guid": "8532129b-65c9-4701-add4-23601af9c3e5",
            "url": "/v2/events/8532129b-65c9-4701-add4-23601af9c3e5",
            "created_at": "2016-11-20T00:02:10Z",
            "updated_at": null
          },

You have hits.total which is showing that all the 304k documents have been hit.
Note that ES is returning by design a limited set of hits in the hits list: you could retrieve more results using the size parameter in the requests, but I'd advice to not fetch too many documents at once, rather paginate the requests.

You can read more about it here: Paginate search results | Elasticsearch Guide [7.13] | Elastic

Thanks Marco, much appreciated.
So if I have to paginate can I still build visual dashboard on the full index and search for hits like a normal index that has been populated by filebeat etc?

When building visualization you most probably will pass through some aggregation, so all your documents will be hit, no pagination will occur on that side.

1 Like

Thank you, let me read some more on pagination and see if we can get the data out we need. Appreciate your fast responses here. Have a good weekend.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.