Instead of using XPath, can you parse and store the whole document? And weed out the fields you don't need (the prune filter might help)? Alternatively, use a ruby filter to join the two arrays in your first solution.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.