What does “column stride field” mean regarding “doc values ” in elasticsearch/lecune

I post this question also in stackoverflow :

I'm trying to understand specifically what does colum-stride mean ? and why is this so efficient for sorting/aggregation ?
is it just the fact that there's a mapping from documentid to its fields ? or its also something specific of the implementation of this data structure that make it so efficient ?

Thanks !

Historically an O(1) data structure that could be directly addressed using a doc ID.
Sparse values were an issue so more recently moved to an iterator API with skip-ahead support when retrieving values for matching docs.

See Sparse versus dense document values with Apache Lucene | Elastic Blog for more.

What does column stride mean ?

It means data is organized in a columnar fashion. For instance say you have 3 records: [ {A: 1, B:42}, {A: 5, B:12}, {A: 5, B:6} ], a column store would actually store something that looks like: { A: [1, 5, 5], B: [42, 12, 6]}. The benefit is that it is usually easier to compress since data is typically homogeneous in a single field, and more efficient for queries that target a limited number of fields by making better use of CPU caches and of the filesystem cache. The downside is that if you want to retrieve all field/value pairs for a given record, this will perform a number of random access that is linear with the number of fields while a row store could do one seek and then read a single chunk that contains all field/value pairs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.