How is Elasticsearch-data represented in Lucene

chocomuesli · June 23, 2016, 10:36pm

Hi guys,

as I learnt, Elasticsearch is based on Apache Lucene as its data is distributed in shards (due to configuration), with each shard being a Lucene index. Lucene stores key-values pairs in an inverted index, while Elasticsearch does this using JSON (please correct me, if I'm wrong).

I tried to understand by browsing through some Elasticsearch shards using Luke, but I can't grasp how Elasticsearch documents are represented in Lucene. Please help me understand, since I've found no information in literature or online. Any hints, corrections and answers are welcome!

Cheers, Dominik

warkolm · June 23, 2016, 10:44pm

A document is stored as json.

We also store the traditional inverted index of the documents for the search aspects, as well as a columnar representation for things like doc values that are used in aggregations.

chocomuesli · June 24, 2016, 5:38am

Hello Mark,

many thanks for your answer! Does the columnar representation include nested-fields? My biggest problem is, understanding how cases like these are represented - since Lucene comes with key-value pairs.

dadoonet · June 24, 2016, 6:04am

Everything is flatten unless you define the nested type for a given array. In that case every single inner object will be index as a Lucene doc.

chocomuesli · June 24, 2016, 9:51am

Solved! Thank you, Mark and David!

Topic		Replies	Views
Storage in Elastic Search Elasticsearch	3	744	July 5, 2017
What database is used for Elasticsearch Elasticsearch	2	377	July 3, 2018
What storage engine does elastic search uses? Elasticsearch	2	3002	July 5, 2017
Get the internal representation of a record Elasticsearch	2	706	October 4, 2017
ElasticSearch as a database Elasticsearch	11	352	July 6, 2017

How is Elasticsearch-data represented in Lucene

Related topics