Hi! I've set up a small local docker instance, where I'm trying to run a POC for my project. Though it has been quite intuitive and easy to set it up, and getting started with inserting data into the indices, I still have some issues.
And the issue is with a single JSON object. The file is quite large though, and is has a good deal of nested objects, and I suspect this is why?
When I'm inserting the data it doesn't really give me any error messages, it just tells me that 1 document has failed. I've seen other places, where people were struggling with it, there were attached an error message to the failed insert. I've tried to increase the nested settings to ridiculous size, but no luck, this being index.mapping.nested_objects.limit
for example
Please keep in mind this is not my actual setup, but just a setup that reproduces the error on that single object. Here's the error message that I get:
Traceback (most recent call last):
File "C:\repo\..\elastic\test.py", line 41, in <module>
success, failed = helpers.bulk(client, actions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\username\.pyenv\pyenv-win\versions\3.11.9\Lib\site-packages\elasticsearch\helpers\actions.py", line 531, in bulk
for ok, item in streaming_bulk(
File "C:\Users\username\.pyenv\pyenv-win\versions\3.11.9\Lib\site-packages\elasticsearch\helpers\actions.py", line 445, in streaming_bulk
for data, (ok, info) in zip(
File "C:\Users\username\.pyenv\pyenv-win\versions\3.11.9\Lib\site-packages\elasticsearch\helpers\actions.py", line 359, in _process_bulk_chunk
yield from gen
File "C:\Users\username\.pyenv\pyenv-win\versions\3.11.9\Lib\site-packages\elasticsearch\helpers\actions.py", line 276, in _process_bulk_chunk_success
raise BulkIndexError(f"{len(errors)} document(s) failed to index.", errors)
elasticsearch.helpers.BulkIndexError: 1 document(s) failed to index
Here's the very simple code that I use to get the error message.
from elasticsearch import Elasticsearch, helpers
import warnings
import json
warnings.filterwarnings("ignore")
client = Elasticsearch(
"https://localhost:9200/",
api_key="API_KEY",
verify_certs=False
)
with open("./large_file.json", "r") as file:
data = json.load(file) # Single object, not a list
actions = [
{
"_index": "test_index",
"_source": [data]
}
]
success, failed = helpers.bulk(client, actions)