I use a python script to send aws cloudtrail logs to elasticsearch. It works most of the time however occasionally I get a parsing error like this:
('1 document(s) failed to index.', [{'index': {'_index':
'n_cloudtrail-2019.03.27', '_type': 'record', '_id': '169deb77-d3f0-4964-8f98-79e64a6923c8', 'status':
400, 'error': {'type': 'mapper_parsing_exception', 'reason': 'failed to parse [apiVersion]', 'caused_
by': {'type': 'illegal_argument_exception', 'reason': 'Invalid format: "2018_11_05" is malformed at "_
11_05"'}}
Just one of these is enough to break the whole index and cause all other indices to be inaccessible.
How can I prevent these from happening? Is it possible to check for parsing errors and skip them before indexing? Or perhaps change the date format of that particular field?
Here's a snippet of my python code:
logger.info('Event: ' + json.dumps(event, indent=2))
s3Bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
try:
response = s3.get_object(Bucket=s3Bucket, Key=key)
content = gzip.GzipFile(fileobj=BytesIO(response['Body'].read())).read()
for record in json.loads(content)['Records']:
recordJson = json.dumps(record)
indexName = 'cloudtrail-' + datetime.datetime.now().strftime("%Y.%m.%d")
res = es.index(index=indexName, doc_type='record', id=record['eventID'], body=recordJson)
logger.info(res)
return True
except Exception as e:
logger.error('Something went wrong: ' + str(e))
traceback.print_exc()
return False