Lucene's main data structure is an inverted index. That is, terms point
For things like sorting and faceting, this doesn't work, because you
need to be able to point from a document to a term (if you want to sort
by the price field, you need to identify the value of the price field
for a specific document).
We solve this with an uninverted index, such as Lucene's FieldCache. The
field cache is built in the background by Lucene by reading through the
inverted index on each commit, and "uninverting" it. This takes an on-
disk data structure, which (on certain OSes) can be accessed via a memory-
mapped file system, and creates an in-heap data structure. This can be
really fast, but suffers from the need to build the data structure
entirely on each commit, which can take some time.
DocValues provide a solution. They are an uninverted column based store,
that is build at index time as an on-disk data structure.
The point of the memory-mapped filesystem is quite simple. Lucene
developers noticed that when they were loading indexes into memory,
they were loading them from disk into the OS disk cache, and then from
there into the Java heap. As a result, there were two copies of the
data in memory, which was overkill. The solution was to switch to using
memory-mapped files in which Java can access the files in the OS disk
cache as if they were simply in memory, thus halving memory
requirements for Lucene.