I am using Python and Eland to loop through 2757 json files that I have created and import them into Elasticsearch. With my code I am able to import the majority of them, only 5 of them do not import and throw the same error. The error I get is not very descriptive. Is there anyway to find out more details as to what the error is, or what part of the file might be causing the error? Is there something I can edit in my code to get additional output on the errors for eland or the Elasticsearch.helpers.BulkIndexError it mentions in the exception?
A quick google turns up similar errors, but they typically have additional information after the documents failed to index, as to why.
Elasticsearch Version 8.2.0
OS: Debian 11
Python: 3.9.3
Python modules:
elastic-transport 8.1.2
Elasticsearch 8.2.0
Code:
import os
from elasticsearch import Elasticsearch
import json
import eland as ed
import pandas as pd
filename = './json/nessus.2053.7875.1636196619.json'
index = "asset_nessus"
client = Elasticsearch(
['https://localhost:9200'],
basic_auth = (os.environ.get('ES_USER'), os.environ.get('ES_PASSWORD'),),
verify_certs=False,
ca_certs=False,
ssl_show_warn=False
)
f = open(filename, "r")
data = f.read()
x = json.loads(data)
df = pd.json_normalize(x)
df['Date'] = pd.to_datetime(df['Date'])
ed.pandas_to_eland(
df,
es_client = client,
es_dest_index = index,
es_type_overrides={
"IPv4Address": "ip",
"Vuln-Synopsis": "text",
"Vuln-Description": "text",
"Vuln-Solution": "text",
"Vuln-PluginOutput": "text",
"Vuln-CVSSScore": "float"
},
es_if_exists = "append",
)
Error:
Traceback (most recent call last):
File "/home/bryan/Nessus/test.py", line 31, in <module>
ed.pandas_to_eland(
File "/home/bryan/.local/lib/python3.9/site-packages/eland/etl.py", line 215, in pandas_to_eland
deque(
File "/home/bryan/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 598, in parallel_bulk
for result in pool.imap(
File "/usr/lib/python3.9/multiprocessing/pool.py", line 870, in next
raise value
File "/usr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/bryan/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 599, in <lambda>
lambda bulk_chunk: list(
File "/home/bryan/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 355, in _process_bulk_chunk
yield from gen
File "/home/bryan/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 274, in _process_bulk_chunk_success
raise BulkIndexError(f"{len(errors)} document(s) failed to index.", errors)
elasticsearch.helpers.BulkIndexError: 1 document(s) failed to index.
Any help would be greatly appreciated. Thanks!