Field cache efficiency

Hello, I'd like to ask a question about field caching on fields not
present on all documents.

We maintain a flat index with documents belonging to different
categories, our documents are like this

{
name: "Document name",
category_id: 42,
filters: {
fg_10: [10,20,30,40],
fg_11: [100, 200, 300, 400]
}
}

  • Each document, depending on its category_id, has different filter
    keys, for example documents with category_id = 42 have:

filters: {
fg_10: [],
fg_11: []
}

where documents with category_id = 25 have:

filters: {
fg_20: [],
fg_21: []
}

One of our two main queries is a term filtered query for a specific
category, with faceting on that category fg_* keys (filters.fg_10,
filters.fg_11)

According to what I have read, elastic keeps a field cache for
each faceted field, so in our case there should be a field cache for
fg_10, fg_11, fg_20, fg_21)

I'd like to know how space-efficient the field cache for our setup is.
For example, is the field cache for fg_10 "smart" enough to only
include documents for category_id=40 or it includes all index
documents taking more space?

There will be an overhead per field, which is the array of integers (sized
as the number of documents (in each segment)) for each field. How many
"fg's" do you have?

On Mon, Apr 9, 2012 at 2:50 PM, Christos Trochalakis christos@skroutz.grwrote:

Hello, I'd like to ask a question about field caching on fields not
present on all documents.

We maintain a flat index with documents belonging to different
categories, our documents are like this

{
name: "Document name",
category_id: 42,
filters: {
fg_10: [10,20,30,40],
fg_11: [100, 200, 300, 400]
}
}

  • Each document, depending on its category_id, has different filter
    keys, for example documents with category_id = 42 have:

filters: {
fg_10: [],
fg_11: []
}

where documents with category_id = 25 have:

filters: {
fg_20: [],
fg_21: []
}

One of our two main queries is a term filtered query for a specific
category, with faceting on that category fg_* keys (filters.fg_10,
filters.fg_11)

According to what I have read, elastic keeps a field cache for
each faceted field, so in our case there should be a field cache for
fg_10, fg_11, fg_20, fg_21)

I'd like to know how space-efficient the field cache for our setup is.
For example, is the field cache for fg_10 "smart" enough to only
include documents for category_id=40 or it includes all index
documents taking more space?

Thanks for the reply, Shay

We have 1.400.000 documents, spanning on ~900 different
categories. We have 1.200 different filter groups (so ~1.3
fgs/category on average).

On Wed, 11 Apr 2012 12:33:31 +0300
Shay Banon kimchy@gmail.com wrote:

There will be an overhead per field, which is the array of integers
(sized as the number of documents (in each segment)) for each field.
How many "fg's" do you have?

On Mon, Apr 9, 2012 at 2:50 PM, Christos Trochalakis
christos@skroutz.grwrote:

Hello, I'd like to ask a question about field caching on fields not
present on all documents.

We maintain a flat index with documents belonging to different
categories, our documents are like this

{
name: "Document name",
category_id: 42,
filters: {
fg_10: [10,20,30,40],
fg_11: [100, 200, 300, 400]
}
}

  • Each document, depending on its category_id, has different filter
    keys, for example documents with category_id = 42 have:

filters: {
fg_10: [],
fg_11: []
}

where documents with category_id = 25 have:

filters: {
fg_20: [],
fg_21: []
}

One of our two main queries is a term filtered query for a specific
category, with faceting on that category fg_* keys (filters.fg_10,
filters.fg_11)

According to what I have read, elastic keeps a field cache for
each faceted field, so in our case there should be a field cache for
fg_10, fg_11, fg_20, fg_21)

I'd like to know how space-efficient the field cache for our setup
is. For example, is the field cache for fg_10 "smart" enough to only
include documents for category_id=40 or it includes all index
documents taking more space?

I see. It will use memory... and come with more overhead, I suggest you run
your tests and see how much memory the field cache is using through the
node stats.

On Wed, Apr 11, 2012 at 3:18 PM, Christos Trochalakis
christos@skroutz.grwrote:

Thanks for the reply, Shay

We have 1.400.000 documents, spanning on ~900 different
categories. We have 1.200 different filter groups (so ~1.3
fgs/category on average).

On Wed, 11 Apr 2012 12:33:31 +0300
Shay Banon kimchy@gmail.com wrote:

There will be an overhead per field, which is the array of integers
(sized as the number of documents (in each segment)) for each field.
How many "fg's" do you have?

On Mon, Apr 9, 2012 at 2:50 PM, Christos Trochalakis
christos@skroutz.grwrote:

Hello, I'd like to ask a question about field caching on fields not
present on all documents.

We maintain a flat index with documents belonging to different
categories, our documents are like this

{
name: "Document name",
category_id: 42,
filters: {
fg_10: [10,20,30,40],
fg_11: [100, 200, 300, 400]
}
}

  • Each document, depending on its category_id, has different filter
    keys, for example documents with category_id = 42 have:

filters: {
fg_10: [],
fg_11: []
}

where documents with category_id = 25 have:

filters: {
fg_20: [],
fg_21: []
}

One of our two main queries is a term filtered query for a specific
category, with faceting on that category fg_* keys (filters.fg_10,
filters.fg_11)

According to what I have read, elastic keeps a field cache for
each faceted field, so in our case there should be a field cache for
fg_10, fg_11, fg_20, fg_21)

I'd like to know how space-efficient the field cache for our setup
is. For example, is the field cache for fg_10 "smart" enough to only
include documents for category_id=40 or it includes all index
documents taking more space?