How is Elasticsearch-data represented in Lucene

Hi guys,

as I learnt, Elasticsearch is based on Apache Lucene as its data is distributed in shards (due to configuration), with each shard being a Lucene index. Lucene stores key-values pairs in an inverted index, while Elasticsearch does this using JSON (please correct me, if I'm wrong).

I tried to understand by browsing through some Elasticsearch shards using Luke, but I can't grasp how Elasticsearch documents are represented in Lucene. Please help me understand, since I've found no information in literature or online. Any hints, corrections and answers are welcome! :slight_smile:

Cheers, Dominik

A document is stored as json.

We also store the traditional inverted index of the documents for the search aspects, as well as a columnar representation for things like doc values that are used in aggregations.

1 Like

Hello Mark,

many thanks for your answer! :grinning: Does the columnar representation include nested-fields? My biggest problem is, understanding how cases like these are represented - since Lucene comes with key-value pairs.

Everything is flatten unless you define the nested type for a given array. In that case every single inner object will be index as a Lucene doc.

1 Like

Solved! Thank you, Mark and David!