I am saving documents to an index in Elasticsearch using the bulk API in a Python project. However, what I need is how can I return the created id of the document after this process?
Yes, the information in the documentation is exactly what I need. However, when I run the code on the source code side in the Python library, what I see is that the data I need remains in the code and it only returns me the number of successful operations. If there is nothing else I missed about this, I will create a solution by contributing to the source code.
Actually, the information I want is in the item in the source code, but I cannot solve the problem because it is not returned, how can I proceed? I wanted to open a PR, I can get the information with a simple method, but I think there is a process that takes a little longer in the background.
success, failed = 0, 0
# list of errors to be collected is not stats_only
errors = []
# make streaming_bulk yield successful results so we can count them
kwargs["yield_ok"] = True
for ok, item in streaming_bulk(
client, actions, ignore_status=ignore_status, *args, **kwargs # type: ignore[misc]
):
# go through request-response pairs and detect failures
if not ok:
if not stats_only:
errors.append(item)
failed += 1
else:
success += 1
return success, failed if stats_only else errors
for ok, document in streaming_bulk(client, actions=actions, index="test-index"):
print(document["index"]["_id"])
Alternatively, you can specify the IDs you want for the documents yourself so you don't have to retrieve them after the fact. Here's an example of that with a custom iterable function:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.