Field cache efficiency

Christos_Trochalakis · April 9, 2012, 11:50am

Hello, I'd like to ask a question about field caching on fields not
present on all documents.

We maintain a flat index with documents belonging to different
categories, our documents are like this

{
name: "Document name",
category_id: 42,
filters: {
fg_10: [10,20,30,40],
fg_11: [100, 200, 300, 400]
}
}

Each document, depending on its category_id, has different filter
keys, for example documents with category_id = 42 have:

filters: {
fg_10: [],
fg_11: []
}

where documents with category_id = 25 have:

filters: {
fg_20: [],
fg_21: []
}

One of our two main queries is a term filtered query for a specific
category, with faceting on that category fg_* keys (filters.fg_10,
filters.fg_11)

According to what I have read, elastic keeps a field cache for
each faceted field, so in our case there should be a field cache for
fg_10, fg_11, fg_20, fg_21)

I'd like to know how space-efficient the field cache for our setup is.
For example, is the field cache for fg_10 "smart" enough to only
include documents for category_id=40 or it includes all index
documents taking more space?

kimchy · April 11, 2012, 9:33am

There will be an overhead per field, which is the array of integers (sized
as the number of documents (in each segment)) for each field. How many
"fg's" do you have?

On Mon, Apr 9, 2012 at 2:50 PM, Christos Trochalakis christos@skroutz.grwrote:

Hello, I'd like to ask a question about field caching on fields not
present on all documents.

We maintain a flat index with documents belonging to different
categories, our documents are like this

{
name: "Document name",
category_id: 42,
filters: {
fg_10: [10,20,30,40],
fg_11: [100, 200, 300, 400]
}
}

Each document, depending on its category_id, has different filter
keys, for example documents with category_id = 42 have:

filters: {
fg_10: [],
fg_11: []
}

where documents with category_id = 25 have:

filters: {
fg_20: [],
fg_21: []
}

One of our two main queries is a term filtered query for a specific
category, with faceting on that category fg_* keys (filters.fg_10,
filters.fg_11)

According to what I have read, elastic keeps a field cache for
each faceted field, so in our case there should be a field cache for
fg_10, fg_11, fg_20, fg_21)

I'd like to know how space-efficient the field cache for our setup is.
For example, is the field cache for fg_10 "smart" enough to only
include documents for category_id=40 or it includes all index
documents taking more space?

Christos_Trochalakis · April 11, 2012, 12:18pm

Thanks for the reply, Shay

We have 1.400.000 documents, spanning on ~900 different
categories. We have 1.200 different filter groups (so ~1.3
fgs/category on average).

On Wed, 11 Apr 2012 12:33:31 +0300
Shay Banon kimchy@gmail.com wrote:

There will be an overhead per field, which is the array of integers
(sized as the number of documents (in each segment)) for each field.
How many "fg's" do you have?

On Mon, Apr 9, 2012 at 2:50 PM, Christos Trochalakis
christos@skroutz.grwrote:

Hello, I'd like to ask a question about field caching on fields not
present on all documents.

We maintain a flat index with documents belonging to different
categories, our documents are like this

{
name: "Document name",
category_id: 42,
filters: {
fg_10: [10,20,30,40],
fg_11: [100, 200, 300, 400]
}
}

Each document, depending on its category_id, has different filter
keys, for example documents with category_id = 42 have:

filters: {
fg_10: [],
fg_11: []
}

where documents with category_id = 25 have:

filters: {
fg_20: [],
fg_21: []
}

One of our two main queries is a term filtered query for a specific
category, with faceting on that category fg_* keys (filters.fg_10,
filters.fg_11)

According to what I have read, elastic keeps a field cache for
each faceted field, so in our case there should be a field cache for
fg_10, fg_11, fg_20, fg_21)

I'd like to know how space-efficient the field cache for our setup
is. For example, is the field cache for fg_10 "smart" enough to only
include documents for category_id=40 or it includes all index
documents taking more space?

kimchy · April 13, 2012, 12:07pm

I see. It will use memory... and come with more overhead, I suggest you run
your tests and see how much memory the field cache is using through the
node stats.

On Wed, Apr 11, 2012 at 3:18 PM, Christos Trochalakis
christos@skroutz.grwrote:

Thanks for the reply, Shay

We have 1.400.000 documents, spanning on ~900 different
categories. We have 1.200 different filter groups (so ~1.3
fgs/category on average).

On Wed, 11 Apr 2012 12:33:31 +0300
Shay Banon kimchy@gmail.com wrote:

There will be an overhead per field, which is the array of integers
(sized as the number of documents (in each segment)) for each field.
How many "fg's" do you have?

On Mon, Apr 9, 2012 at 2:50 PM, Christos Trochalakis
christos@skroutz.grwrote:

Hello, I'd like to ask a question about field caching on fields not
present on all documents.

We maintain a flat index with documents belonging to different
categories, our documents are like this

{
name: "Document name",
category_id: 42,
filters: {
fg_10: [10,20,30,40],
fg_11: [100, 200, 300, 400]
}
}

Each document, depending on its category_id, has different filter
keys, for example documents with category_id = 42 have:

filters: {
fg_10: [],
fg_11: []
}

where documents with category_id = 25 have:

filters: {
fg_20: [],
fg_21: []
}

One of our two main queries is a term filtered query for a specific
category, with faceting on that category fg_* keys (filters.fg_10,
filters.fg_11)

According to what I have read, elastic keeps a field cache for
each faceted field, so in our case there should be a field cache for
fg_10, fg_11, fg_20, fg_21)

I'd like to know how space-efficient the field cache for our setup
is. For example, is the field cache for fg_10 "smart" enough to only
include documents for category_id=40 or it includes all index
documents taking more space?

Topic		Replies	Views
Unexpected fielddata cache usage Elasticsearch	1	420	July 5, 2017
Limiting the Field Cache with Filters on Documents Elasticsearch	3	343	July 6, 2017
Strange behaviour in field cache use? Elasticsearch	5	340	July 6, 2017
Question on index field cache stats reported per node Elasticsearch	1	346	July 6, 2017
Trying to understand the filter cache Elasticsearch	3	767	July 5, 2017

Field cache efficiency

Related topics