I am getting a lot of rejected docs from Elasticsearch, and when I look at the reason in the dead letter queue file, it says "Can't get text on a START_OBJECT". The only thing I'm doing using the XML filter and then sending to Elasticsearch.
I think what is happening is that at one point in my XML documents, some of them have one level of a text element:
<text>Here is some text.</text>
while others have nested levels:
<text>Here is some text.
<text> Here is some more text.</text>
</text>
I think that this means the first file's output will cause Elasticsearch to set the mapping for text to be text and then on later docs it will be an object.
Is that correct? If so, how can I handle this issue?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.