I want to know how to avoid parsing error in foreign language

I have found that Vietnamese has a parsing error in logStash
I want to know how to avoid errors.

test.xml

Tập 12 - Mười Tội Ác - The Ten Deadly Sins 2016

error message

Trouble parsing xml with XmlSimple {:source=>"message", :value=>"e>\n", :exception=>#<REXML::ParseException: #<REXML::ParseException: Missing end tag for '' (got "xml-fragment")


This looks like a case of broken XML. Can you share the full XML document?

<?xml version="1.0" encoding="UTF-8" standalone="no"?> 95953_4 Tập1 - Biến Hóa HoànTEST Hảo 01 01:05 HQ AAC MOBILE 2 0 N CDBR_KT1 Y 2016-07-08 9999-12-31

We parse more than 100 xml files at a time. Then an error occurs and Incorrect data is entered.
Note that we delete files stored in Elastic Search DB after 7 days.

It is difficult to find the cause of the error. There is nothing in common.

and LogStash has a bug that does not work if there are more than 5000 files to process.
so we delete the files stored in Elastic Search DB

Any help appreciated.

1) error

2) incorred data

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.