You could store the XML documents as-is into ES, but then they will essentially be treated as text. That might not satisfy your search requirements. Or, convert them to JSON and store the original XML somewhere (either inside ES or outside) so that they work well for searches but also can be used to extract the original XML.
We are planning to use Distributed computing framework like Spark to work on top of ES.
If we store whole xml data as text in ES, and parse it, it would be double work.
Splunk has a feature to work on top of xmls. If you specify a breaking String or Xpath, it will break the data repeatedly there and provides you events. To above xml, we will get list of <Record> nodes.
Does ES have any such solution.? Lets say if i specify as line breaker, can it give all the events of file by breaking by that string while search.
If we store whole xml data as text in ES, and parse it, it would be double work.
How so?
Elasticsearch tokenizes data upon input, not when searching. It should be possible to have an analyzer that tokenizes XML as you describe, but that will take place when you submit the documents. If you want to extract the original document it needs to be stored alongside.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.