Slow ES Queries

Shashi · April 17, 2012, 6:00pm

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.

We have the following data

Log data (5 million documents per day)
Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index
Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)

We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi

Radu_Gheorghe1 · April 18, 2012, 7:31am

Hi Shashi,

I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:

allocating ~half the amount of RAM to ES (min=max)
disable _all
compress _source
use one index per day (you can use another time unit that will fit
your needs better), and remove old indices when we want to discard old
data. You can also optimize indices that you're finished with (eg: the
one from yesterday)
increase the refresh interval
because we're always sorting logs by date (so we don't need
analysis), we're doing filters instead of queries.

I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...

Best regards,
Radu

On Apr 17, 9:00 pm, Shashi shash...@gmail.com wrote:

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.

We have the following data

Log data (5 million documents per day)

Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index

Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)

We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi

Shashi · April 18, 2012, 5:58pm

Thank you very much for your comments Radu, you brought up some
interesting points that we could try.

Still hoping that someone would reply regarding parent-child usage.

-Shashi

On Apr 18, 12:31 am, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Shashi,

I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:

allocating ~half the amount of RAM to ES (min=max)

disable _all

compress _source

use one index per day (you can use another time unit that will fit
your needs better), and remove old indices when we want to discard old
data. You can also optimize indices that you're finished with (eg: the
one from yesterday)

increase the refresh interval

because we're always sorting logs by date (so we don't need
analysis), we're doing filters instead of queries.

I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...

Best regards,
Radu

On Apr 17, 9:00 pm, Shashi shash...@gmail.com wrote:

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.

We have the following data

Log data (5 million documents per day)

Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index

Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)

We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi

kimchy · April 21, 2012, 11:14am

I am not sure I understood your data distribution. Are all 1, 2 and 3 docs
inserted into the same index? You might just need to have more machines to
handle the load you are driving into ES.

On Wed, Apr 18, 2012 at 8:58 PM, Shashi shaship2@gmail.com wrote:

Thank you very much for your comments Radu, you brought up some
interesting points that we could try.

Still hoping that someone would reply regarding parent-child usage.

-Shashi

On Apr 18, 12:31 am, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Shashi,

I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:

allocating ~half the amount of RAM to ES (min=max)

disable _all

compress _source

use one index per day (you can use another time unit that will fit
your needs better), and remove old indices when we want to discard old
data. You can also optimize indices that you're finished with (eg: the
one from yesterday)

increase the refresh interval

because we're always sorting logs by date (so we don't need
analysis), we're doing filters instead of queries.

I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...

Best regards,
Radu

On Apr 17, 9:00 pm, Shashi shash...@gmail.com wrote:

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.

We have the following data

Log data (5 million documents per day)

Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index

Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)

We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi

Shashi · April 24, 2012, 12:19am

Thanks for your reply Shay,
Yes we have inserted all the docs in to one index hoping to use parent
child queries. If we move logs to a separate index, Will we be able to
use parent child queries for rest of the docs efficiently?
We are thinking of moving logs to a different machine so that we might
have better performance for rest of queries.

-Shashi

On Apr 21, 4:14 am, Shay Banon kim...@gmail.com wrote:

I am not sure I understood your data distribution. Are all 1, 2 and 3 docs
inserted into the same index? You might just need to have more machines to
handle the load you are driving into ES.

On Wed, Apr 18, 2012 at 8:58 PM, Shashi shash...@gmail.com wrote:

Thank you very much for your comments Radu, you brought up some
interesting points that we could try.

Still hoping that someone would reply regarding parent-child usage.

-Shashi

On Apr 18, 12:31 am, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Shashi,

I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:

allocating ~half the amount of RAM to ES (min=max)

disable _all

compress _source

use one index per day (you can use another time unit that will fit
your needs better), and remove old indices when we want to discard old
data. You can also optimize indices that you're finished with (eg: the
one from yesterday)

increase the refresh interval

because we're always sorting logs by date (so we don't need
analysis), we're doing filters instead of queries.

I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...

Best regards,
Radu

On Apr 17, 9:00 pm, Shashi shash...@gmail.com wrote:

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.

We have the following data

Log data (5 million documents per day)

Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index

Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)

We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi

kimchy · April 25, 2012, 4:01pm

It sounds like it might make sense to have the different data types you
have in different indices, because of their very different behavior. The
logs for example would also greatly benefit from a rolling index solution.
See a bit more about "data flow" thread here:
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/data$20flow/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ
.

On Tue, Apr 24, 2012 at 3:19 AM, Shashi shaship2@gmail.com wrote:

Thanks for your reply Shay,
Yes we have inserted all the docs in to one index hoping to use parent
child queries. If we move logs to a separate index, Will we be able to
use parent child queries for rest of the docs efficiently?
We are thinking of moving logs to a different machine so that we might
have better performance for rest of queries.

-Shashi

On Apr 21, 4:14 am, Shay Banon kim...@gmail.com wrote:

I am not sure I understood your data distribution. Are all 1, 2 and 3
docs
inserted into the same index? You might just need to have more machines
to
handle the load you are driving into ES.

On Wed, Apr 18, 2012 at 8:58 PM, Shashi shash...@gmail.com wrote:

Thank you very much for your comments Radu, you brought up some
interesting points that we could try.

Still hoping that someone would reply regarding parent-child usage.

-Shashi

On Apr 18, 12:31 am, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Shashi,

I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:

allocating ~half the amount of RAM to ES (min=max)

disable _all

compress _source

use one index per day (you can use another time unit that will fit
your needs better), and remove old indices when we want to discard
old
data. You can also optimize indices that you're finished with (eg:
the
one from yesterday)

increase the refresh interval

because we're always sorting logs by date (so we don't need
analysis), we're doing filters instead of queries.

I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...

Best regards,
Radu

On Apr 17, 9:00 pm, Shashi shash...@gmail.com wrote:

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to
what
people have posted.
I am sure we are doing something wrong with our design. Could any
one
please suggest us some improvements.

We have the following data

Log data (5 million documents per day)

Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index

Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2
shards(both
on same machine)

We wanted to use parent child relation ship. So we created an
index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data.
But
now after we loaded around 200 Million of those logs, every thing
is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes
in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi

Topic		Replies	Views
Parent/Child query performance in version 1.1.2 Elasticsearch	7	450	July 6, 2017
ES performance issues for 800G data per day Elasticsearch	9	509	July 6, 2017
ES with one node Elasticsearch	3	343	July 24, 2019
Slow parent/child relationship queries Elasticsearch	1	421	July 6, 2017
Solution for slow queries with parent / child? Elasticsearch	5	335	July 6, 2017

Slow ES Queries

Related topics