Slow ES Queries

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.

We have the following data

  1. Log data (5 million documents per day)
  2. Our primary db data.
    a) Type A - 100,000 documents (growth 5% per month)
    b) Type B - 500,000 documents (growth 5% per month)
    c) Type C - 30 Million Documents (growth 15% per month)
    I would prefer to have these under one index
  3. Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)

We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi

Hi Shashi,

I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:

  • allocating ~half the amount of RAM to ES (min=max)
  • disable _all
  • compress _source
  • use one index per day (you can use another time unit that will fit
    your needs better), and remove old indices when we want to discard old
    data. You can also optimize indices that you're finished with (eg: the
    one from yesterday)
  • increase the refresh interval
  • because we're always sorting logs by date (so we don't need
    analysis), we're doing filters instead of queries.

I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...

Best regards,
Radu

On Apr 17, 9:00 pm, Shashi shash...@gmail.com wrote:

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.

We have the following data

  1. Log data (5 million documents per day)
  2. Our primary db data.
    a) Type A - 100,000 documents (growth 5% per month)
    b) Type B - 500,000 documents (growth 5% per month)
    c) Type C - 30 Million Documents (growth 15% per month)
    I would prefer to have these under one index
  3. Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)

We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi

Thank you very much for your comments Radu, you brought up some
interesting points that we could try.

Still hoping that someone would reply regarding parent-child usage.

-Shashi

On Apr 18, 12:31 am, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Shashi,

I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:

  • allocating ~half the amount of RAM to ES (min=max)
  • disable _all
  • compress _source
  • use one index per day (you can use another time unit that will fit
    your needs better), and remove old indices when we want to discard old
    data. You can also optimize indices that you're finished with (eg: the
    one from yesterday)
  • increase the refresh interval
  • because we're always sorting logs by date (so we don't need
    analysis), we're doing filters instead of queries.

I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...

Best regards,
Radu

On Apr 17, 9:00 pm, Shashi shash...@gmail.com wrote:

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.

We have the following data

  1. Log data (5 million documents per day)
  2. Our primary db data.
    a) Type A - 100,000 documents (growth 5% per month)
    b) Type B - 500,000 documents (growth 5% per month)
    c) Type C - 30 Million Documents (growth 15% per month)
    I would prefer to have these under one index
  3. Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)

We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi

I am not sure I understood your data distribution. Are all 1, 2 and 3 docs
inserted into the same index? You might just need to have more machines to
handle the load you are driving into ES.

On Wed, Apr 18, 2012 at 8:58 PM, Shashi shaship2@gmail.com wrote:

Thank you very much for your comments Radu, you brought up some
interesting points that we could try.

Still hoping that someone would reply regarding parent-child usage.

-Shashi

On Apr 18, 12:31 am, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Shashi,

I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:

  • allocating ~half the amount of RAM to ES (min=max)
  • disable _all
  • compress _source
  • use one index per day (you can use another time unit that will fit
    your needs better), and remove old indices when we want to discard old
    data. You can also optimize indices that you're finished with (eg: the
    one from yesterday)
  • increase the refresh interval
  • because we're always sorting logs by date (so we don't need
    analysis), we're doing filters instead of queries.

I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...

Best regards,
Radu

On Apr 17, 9:00 pm, Shashi shash...@gmail.com wrote:

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.

We have the following data

  1. Log data (5 million documents per day)
  2. Our primary db data.
    a) Type A - 100,000 documents (growth 5% per month)
    b) Type B - 500,000 documents (growth 5% per month)
    c) Type C - 30 Million Documents (growth 15% per month)
    I would prefer to have these under one index
  3. Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)

We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi

Thanks for your reply Shay,
Yes we have inserted all the docs in to one index hoping to use parent
child queries. If we move logs to a separate index, Will we be able to
use parent child queries for rest of the docs efficiently?
We are thinking of moving logs to a different machine so that we might
have better performance for rest of queries.

-Shashi

On Apr 21, 4:14 am, Shay Banon kim...@gmail.com wrote:

I am not sure I understood your data distribution. Are all 1, 2 and 3 docs
inserted into the same index? You might just need to have more machines to
handle the load you are driving into ES.

On Wed, Apr 18, 2012 at 8:58 PM, Shashi shash...@gmail.com wrote:

Thank you very much for your comments Radu, you brought up some
interesting points that we could try.

Still hoping that someone would reply regarding parent-child usage.

-Shashi

On Apr 18, 12:31 am, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Shashi,

I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:

  • allocating ~half the amount of RAM to ES (min=max)
  • disable _all
  • compress _source
  • use one index per day (you can use another time unit that will fit
    your needs better), and remove old indices when we want to discard old
    data. You can also optimize indices that you're finished with (eg: the
    one from yesterday)
  • increase the refresh interval
  • because we're always sorting logs by date (so we don't need
    analysis), we're doing filters instead of queries.

I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...

Best regards,
Radu

On Apr 17, 9:00 pm, Shashi shash...@gmail.com wrote:

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.

We have the following data

  1. Log data (5 million documents per day)
  2. Our primary db data.
    a) Type A - 100,000 documents (growth 5% per month)
    b) Type B - 500,000 documents (growth 5% per month)
    c) Type C - 30 Million Documents (growth 15% per month)
    I would prefer to have these under one index
  3. Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)

We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi

It sounds like it might make sense to have the different data types you
have in different indices, because of their very different behavior. The
logs for example would also greatly benefit from a rolling index solution.
See a bit more about "data flow" thread here:
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/data$20flow/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ
.

On Tue, Apr 24, 2012 at 3:19 AM, Shashi shaship2@gmail.com wrote:

Thanks for your reply Shay,
Yes we have inserted all the docs in to one index hoping to use parent
child queries. If we move logs to a separate index, Will we be able to
use parent child queries for rest of the docs efficiently?
We are thinking of moving logs to a different machine so that we might
have better performance for rest of queries.

-Shashi

On Apr 21, 4:14 am, Shay Banon kim...@gmail.com wrote:

I am not sure I understood your data distribution. Are all 1, 2 and 3
docs
inserted into the same index? You might just need to have more machines
to
handle the load you are driving into ES.

On Wed, Apr 18, 2012 at 8:58 PM, Shashi shash...@gmail.com wrote:

Thank you very much for your comments Radu, you brought up some
interesting points that we could try.

Still hoping that someone would reply regarding parent-child usage.

-Shashi

On Apr 18, 12:31 am, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Shashi,

I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:

  • allocating ~half the amount of RAM to ES (min=max)
  • disable _all
  • compress _source
  • use one index per day (you can use another time unit that will fit
    your needs better), and remove old indices when we want to discard
    old
    data. You can also optimize indices that you're finished with (eg:
    the
    one from yesterday)
  • increase the refresh interval
  • because we're always sorting logs by date (so we don't need
    analysis), we're doing filters instead of queries.

I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...

Best regards,
Radu

On Apr 17, 9:00 pm, Shashi shash...@gmail.com wrote:

Hello all,

Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to
what
people have posted.
I am sure we are doing something wrong with our design. Could any
one
please suggest us some improvements.

We have the following data

  1. Log data (5 million documents per day)
  2. Our primary db data.
    a) Type A - 100,000 documents (growth 5% per month)
    b) Type B - 500,000 documents (growth 5% per month)
    c) Type C - 30 Million Documents (growth 15% per month)
    I would prefer to have these under one index
  3. Metrics events (around 100 k documents per day)

On a big EC2 instance(2x large), we have created ES with 2
shards(both
on same machine)

We wanted to use parent child relation ship. So we created an
index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data.
But
now after we loaded around 200 Million of those logs, every thing
is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes
in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?

Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi