Advice on the memory consumption

Bill_Lee · February 8, 2012, 11:26pm

Hi, List

We are trying to use Elastic search in the data analytics use case,
where we have around 7 million (increasing around 1 million per month)
transaction data. and we are digging out aggregation result from
it.

In our cases, we use facet heavily with filter/queries . Our document
is complex type with arrays and nested objects in array to allow us
store certain 1-many relationship. Have done a few performance test
and it turn out to be very good.

{
id: 123,
transaction_time: "2011-01-01T01:01:01",
attributes: [
{
attribute_id: 101,
attribute_value: "something1"
},
{
attribute_id: 102,
attribute_value: "something2"
},
{
attribute_id: 103,
attribute_value: "something3"
},
{
attribute_id: 104,
attribute_value: "something4"
}
],
hierarchies: [
{
role: "agent",
name: "Bill"
},{
role: manager,
name: "Shelly"
}
]
}

for document like this, if we perform a lot of facet on nested fileds
like hierarchies[0].role for example, would it require a more memeory
than the simple fileds like time or id?

From the facet documentation, i can almost be certain that it is the
case, but is there any recommandations or someone has already done it
who can give us some advice,

i.e. some discussion was saying in order to reteive the value, es has
to open each doc and put the nest object values in memory, if this is
a data record that has 10 million records, how much memory are we
talking about ? 2G, 4G, 10G ? 100G?

because if it is around 10G, we can handle that, but if it increase to
a certain amount that reaches say 100G, then that is very hard for us
to justify this kind of calculation using ES then, so wanted someone
to point out their experiences so that we don't have to run into the
trouble later.

kimchy · February 9, 2012, 8:26am

Can you give an example of a how you use a facet on nested fields? Are you using scripting? Do you use nested mappings?

On Thursday, February 9, 2012 at 1:26 AM, Bill Lee wrote:

Hi, List

We are trying to use Elastic search in the data analytics use case,
where we have around 7 million (increasing around 1 million per month)
transaction data. and we are digging out aggregation result from
it.

In our cases, we use facet heavily with filter/queries . Our document
is complex type with arrays and nested objects in array to allow us
store certain 1-many relationship. Have done a few performance test
and it turn out to be very good.

{
id: 123,
transaction_time: "2011-01-01T01:01:01",
attributes: [
{
attribute_id: 101,
attribute_value: "something1"
},
{
attribute_id: 102,
attribute_value: "something2"
},
{
attribute_id: 103,
attribute_value: "something3"
},
{
attribute_id: 104,
attribute_value: "something4"
}
],
hierarchies: [
{
role: "agent",
name: "Bill"
},{
role: manager,
name: "Shelly"
}
]
}

for document like this, if we perform a lot of facet on nested fileds
like hierarchies[0].role for example, would it require a more memeory
than the simple fileds like time or id?

From the facet documentation, i can almost be certain that it is the
case, but is there any recommandations or someone has already done it
who can give us some advice,

i.e. some discussion was saying in order to reteive the value, es has
to open each doc and put the nest object values in memory, if this is
a data record that has 10 million records, how much memory are we
talking about ? 2G, 4G, 10G ? 100G?

because if it is around 10G, we can handle that, but if it increase to
a certain amount that reaches say 100G, then that is very hard for us
to justify this kind of calculation using ES then, so wanted someone
to point out their experiences so that we don't have to run into the
trouble later.

Alex_At_Ikanow · February 9, 2012, 4:25pm

Bill,

Since you asked for other peoples' experiences, and our project
involves a lot of aggregation via faceting, here's our data point (for
what its worth):

I seem to get ~125MB-250MB usage per million nested string fields,
per replica/original (mean lengths between 32 and 64B), eg I've just
run a facet across 2M nested string fields on a system with 3 nodes
-45 shards- and 1 replica, and its using 750MB in total across the
nodes).

This is based on checking the field data cache, reported by bigdesk.
There are of course other memory requirements, eg the index.

This seems to scale approximately linearly (disclaimer: based on very
few data points, ~5M being the largest!)

Non-nested multi-valued fields behave differently and are to be
avoided for large datasets unless you have tight bounds on the max
array size. (See the discussion in
https://groups.google.com/group/elasticsearch/browse_thread/thread/31d87c84dd387367/3324ed6bda200a9a#3324ed6bda200a9a)

Topic		Replies	Views
Facet query on nested structure - Out of memory exception Elasticsearch	1	312	July 6, 2017
Memory building up while faceting on multi valued fields Elasticsearch	9	421	July 6, 2017
Hints for reducing facet memory? Elasticsearch	2	330	July 6, 2017
Nested vs. multi-valued for facet performance Elasticsearch	3	598	July 6, 2017
Array vs Nested type when faceting Elasticsearch	1	300	July 6, 2017

Advice on the memory consumption

Related topics