Hello I have a text content in XML format with formatting tags inside text:
... water chemical formula is H2O and the energy is
E=MC2 formula.
How to properly convert this example to JSON format for elastic search, and
to keep the search features and highlighting consistent?
the "abstract" XML element is semantic markup, it means "here comes an
abstract"
"sub" /"sup" elements are (X)HTML markup and they mean "display me in
superscript/subscript style on your favorite output device"
The "abstract" element need to be parsed, and you need to decide how to
index abstracts in ES. The "sub"/sup" elements need to be dropped. Display
markup in your index mixed up with your textual content will render your
index unusable.
Assuming the markdown control characters are not interfering with your
Lucene analysis and word search, you could even add a Markdown formatter to
present your ES docs / snippets.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.