as I learnt, Elasticsearch is based on Apache Lucene as its data is distributed in shards (due to configuration), with each shard being a Lucene index. Lucene stores key-values pairs in an inverted index, while Elasticsearch does this using JSON (please correct me, if I'm wrong).
I tried to understand by browsing through some Elasticsearch shards using Luke, but I can't grasp how Elasticsearch documents are represented in Lucene. Please help me understand, since I've found no information in literature or online. Any hints, corrections and answers are welcome!
We also store the traditional inverted index of the documents for the search aspects, as well as a columnar representation for things like doc values that are used in aggregations.
many thanks for your answer! Does the columnar representation include nested-fields? My biggest problem is, understanding how cases like these are represented - since Lucene comes with key-value pairs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.