Nested Document performance

Hi there,

I'm testing out a design by loading an index (contents - 5 shards, 1
replica). We want to store user interactions on a document (e.g. a user
could 'buy'/'sell'/'click' etc) so we want to have generic document
information stored in the content index (documents type) and a
document_interactions type also.

We created the index with ~ 160k parent documents and ~830k nested child
docs (e.g. average of >5 documents per parent. The parent/child docs have
only a few fields (5 each) and are basic types (string, time, boolean)

We then made a query which runs does a boolQuery across the children and
also the parent.

The first run on a single machine (Intel i7 3Ghz Quad core, 8GB Ram) took
16 seconds (no warming) with <100ms for the 2nd run and ~10ms for the 3rd.

I then wanted to see what would happen when i scaled out (i know i have not
hit the first machines limit yet but wanted to check the effect) and added
2 more machines (equal spec) to the cluster. My response times now were a
constant 5-8 seconds.

I was wondering
a) Is this due to the fact that there is a post aggregation step now by the
master?
b) When we tested before with simple queries (e.g. no nesting), we saw that
scaling out increased performance but here it did not - any idea's if the
problem is localised to nested child queries?

We are running ES 0.19.11 and testing using the Java API.

Many thanks in advance,

Derry

--

Hi Derry,

I'm a bit confused here what kind of queries you refer to. (nested
query or top_children, has_child queries, or an combination.).
Can you perhaps share your search request?

The nodes that contain the shards were the search request gets
executed on, will be involved in the query. The master node doesn't
need to be involved with this. I also would expect the performance of
the query to be increased when adding a node, so what you're
experiencing is weird.

Martijn

On 26 November 2012 10:00, Derry O' Sullivan derryos@gmail.com wrote:

Hi there,

I'm testing out a design by loading an index (contents - 5 shards, 1
replica). We want to store user interactions on a document (e.g. a user
could 'buy'/'sell'/'click' etc) so we want to have generic document
information stored in the content index (documents type) and a
document_interactions type also.

We created the index with ~ 160k parent documents and ~830k nested child
docs (e.g. average of >5 documents per parent. The parent/child docs have
only a few fields (5 each) and are basic types (string, time, boolean)

We then made a query which runs does a boolQuery across the children and
also the parent.

The first run on a single machine (Intel i7 3Ghz Quad core, 8GB Ram) took 16
seconds (no warming) with <100ms for the 2nd run and ~10ms for the 3rd.

I then wanted to see what would happen when i scaled out (i know i have not
hit the first machines limit yet but wanted to check the effect) and added 2
more machines (equal spec) to the cluster. My response times now were a
constant 5-8 seconds.

I was wondering
a) Is this due to the fact that there is a post aggregation step now by the
master?
b) When we tested before with simple queries (e.g. no nesting), we saw that
scaling out increased performance but here it did not - any idea's if the
problem is localised to nested child queries?

We are running ES 0.19.11 and testing using the Java API.

Many thanks in advance,

Derry

--

--
Met vriendelijke groet,

Martijn van Groningen

--