I have done a bulk upload via the API in python as follows:
def to_bulk_doc(_doc):
return {
# default value: 'index'
'_op_type': 'index',
'_index': es_index,
'_id': uuid.uuid4(),
'_source': _doc,
'_type': 'document'
}
for doc in json_docs_list:
json_doc = json_docs_list[0]
doc_resources: list = json.loads(doc)['resources']
# split doc_resources into a list-of-lists, where each list has max=max_batch_size elements
chunks = chunked(doc_resources, max_batch_size)
for batch in chunks:
# convert batch of json docs to a format compatible with bulk API using the 'to_bulk_doc'
function defined above
actions = map(to_bulk_doc, batch)
res = helpers.bulk(client=es, actions=actions)
print(res)
This successfully uploads the documents to the index:
In Dev Tools: GET /_cat/indices/es_index?v=true
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open es_index Xngti... 5 1 304444 0 445.3mb 222.5mb
But searching in Kibana Dev Tools with:
GET es_index/_search
only returns 346 lines and nothing else, yet there is supposed to be 304K documents?
You have hits.total which is showing that all the 304k documents have been hit.
Note that ES is returning by design a limited set of hits in the hits list: you could retrieve more results using the size parameter in the requests, but I'd advice to not fetch too many documents at once, rather paginate the requests.
Thanks Marco, much appreciated.
So if I have to paginate can I still build visual dashboard on the full index and search for hits like a normal index that has been populated by filebeat etc?
When building visualization you most probably will pass through some aggregation, so all your documents will be hit, no pagination will occur on that side.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.