How can I get the document meta data after saving a document in elasticsearch?

Mertozturkk · October 9, 2023, 2:11pm

I am saving documents to an index in Elasticsearch using the bulk API in a Python project. However, what I need is how can I return the created id of the document after this process?

success, failed = bulk(client, actions, refresh=True)

            result = {
                "success_count": success,
                "failed_count": len(failed),
                "success_data": actions if success == len(actions) else [],
                "failed_data": failed
            }

carly.richmond · October 10, 2023, 8:42am

Hi @Mertozturkk,

Wwlcome to the community! Have you checked the response body of the build request?

Looking at the bulk API documentation there is an items collection in the response where the impacted document id for each action is included.

Can you take a look and see if you can extract that from the response?

Mertozturkk · October 10, 2023, 9:10am

Yes, the information in the documentation is exactly what I need. However, when I run the code on the source code side in the Python library, what I see is that the data I need remains in the code and it only returns me the number of successful operations. If there is nothing else I missed about this, I will create a solution by contributing to the source code.

Mertozturkk · October 10, 2023, 7:38pm

Actually, the information I want is in the item in the source code, but I cannot solve the problem because it is not returned, how can I proceed? I wanted to open a PR, I can get the information with a simple method, but I think there is a process that takes a little longer in the background.

success, failed = 0, 0

    # list of errors to be collected is not stats_only
    errors = []

    # make streaming_bulk yield successful results so we can count them
    kwargs["yield_ok"] = True
    for ok, item in streaming_bulk(
        client, actions, ignore_status=ignore_status, *args, **kwargs  # type: ignore[misc]
    ):
        # go through request-response pairs and detect failures
        if not ok:
            if not stats_only:
                errors.append(item)
            failed += 1
        else:
            success += 1

    return success, failed if stats_only else errors

Elasticsearch version (8.10.0):

elasticsearch-py version (8.10.0):

iulia · October 11, 2023, 5:51pm

You can get the IDs as you bulk index documents:

for ok, document in streaming_bulk(client, actions=actions, index="test-index"):
    print(document["index"]["_id"])

Alternatively, you can specify the IDs you want for the documents yourself so you don't have to retrieve them after the fact. Here's an example of that with a custom iterable function:

github.com

elastic/elasticsearch-py/blob/main/examples/bulk-ingest/bulk-ingest.py

#!/usr/bin/env python
# Licensed to Elasticsearch B.V under one or more agreements.
# Elasticsearch B.V licenses this file to you under the Apache 2.0 License.
# See the LICENSE file in the project root for more information

"""Script that downloads a public dataset and streams it to an Elasticsearch cluster"""

import csv
from os.path import abspath, join, dirname, exists
import tqdm
import urllib3
from elasticsearch import Elasticsearch
from elasticsearch.helpers import streaming_bulk


NYC_RESTAURANTS = (
    "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv?accessType=DOWNLOAD"
)
DATASET_PATH = join(dirname(abspath(__file__)), "nyc-restaurants.csv")
CHUNK_SIZE = 16384

This file has been truncated. show original

Mertozturkk · October 11, 2023, 6:09pm

Thank you, I missed the values that streaming bulk returns, this will make my job easier.

system · November 8, 2023, 6:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Document lost or not indexed during bulk index Elasticsearch	4	1647	July 23, 2020
Bulk API returns 201 or 200 but no documents in index (confused) Elasticsearch	6	8793	July 5, 2017
Get the only failed document response in Bulk API Elasticsearch Elasticsearch	5	6478	June 2, 2018
How to return the modified document with bulk update Elasticsearch	1	1726	August 7, 2020
How to get all document _id of an elasticsearch index Elasticsearch	2	1670	July 6, 2017

How can I get the document meta data after saving a document in elasticsearch?

Related topics