Out of memory error and duplicate rows

Is it possible that one of your documents is over 1 GB? A few years ago the maximum string size in Java was reduced from 2 GB (because each UTF-16 character uses 2 bytes and it uses a 32-bit string length) to 1 GB (a side-effect of compact strings which are byte arrays rather than char arrays).

Is the pipeline getting restarted over and over again due to this exception? If so, perhaps use the customer_id as the document_id in elasticsearch, so that it will just keep overwriting the document.