Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.
We have the following data
Log data (5 million documents per day)
Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index
Metrics events (around 100 k documents per day)
On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)
We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?
Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi
I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:
allocating ~half the amount of RAM to ES (min=max)
disable _all
compress _source
use one index per day (you can use another time unit that will fit
your needs better), and remove old indices when we want to discard old
data. You can also optimize indices that you're finished with (eg: the
one from yesterday)
increase the refresh interval
because we're always sorting logs by date (so we don't need
analysis), we're doing filters instead of queries.
I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...
Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.
We have the following data
Log data (5 million documents per day)
Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index
Metrics events (around 100 k documents per day)
On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)
We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?
Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi
I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:
allocating ~half the amount of RAM to ES (min=max)
disable _all
compress _source
use one index per day (you can use another time unit that will fit
your needs better), and remove old indices when we want to discard old
data. You can also optimize indices that you're finished with (eg: the
one from yesterday)
increase the refresh interval
because we're always sorting logs by date (so we don't need
analysis), we're doing filters instead of queries.
I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...
Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.
We have the following data
Log data (5 million documents per day)
Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index
Metrics events (around 100 k documents per day)
On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)
We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?
Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi
I am not sure I understood your data distribution. Are all 1, 2 and 3 docs
inserted into the same index? You might just need to have more machines to
handle the load you are driving into ES.
I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:
allocating ~half the amount of RAM to ES (min=max)
disable _all
compress _source
use one index per day (you can use another time unit that will fit
your needs better), and remove old indices when we want to discard old
data. You can also optimize indices that you're finished with (eg: the
one from yesterday)
increase the refresh interval
because we're always sorting logs by date (so we don't need
analysis), we're doing filters instead of queries.
I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...
Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.
We have the following data
Log data (5 million documents per day)
Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index
Metrics events (around 100 k documents per day)
On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)
We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?
Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi
Thanks for your reply Shay,
Yes we have inserted all the docs in to one index hoping to use parent
child queries. If we move logs to a separate index, Will we be able to
use parent child queries for rest of the docs efficiently?
We are thinking of moving logs to a different machine so that we might
have better performance for rest of queries.
I am not sure I understood your data distribution. Are all 1, 2 and 3 docs
inserted into the same index? You might just need to have more machines to
handle the load you are driving into ES.
I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:
allocating ~half the amount of RAM to ES (min=max)
disable _all
compress _source
use one index per day (you can use another time unit that will fit
your needs better), and remove old indices when we want to discard old
data. You can also optimize indices that you're finished with (eg: the
one from yesterday)
increase the refresh interval
because we're always sorting logs by date (so we don't need
analysis), we're doing filters instead of queries.
I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...
Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to what
people have posted.
I am sure we are doing something wrong with our design. Could any one
please suggest us some improvements.
We have the following data
Log data (5 million documents per day)
Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index
Metrics events (around 100 k documents per day)
On a big EC2 instance(2x large), we have created ES with 2 shards(both
on same machine)
We wanted to use parent child relation ship. So we created an index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data. But
now after we loaded around 200 Million of those logs, every thing is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?
Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi
Thanks for your reply Shay,
Yes we have inserted all the docs in to one index hoping to use parent
child queries. If we move logs to a separate index, Will we be able to
use parent child queries for rest of the docs efficiently?
We are thinking of moving logs to a different machine so that we might
have better performance for rest of queries.
I am not sure I understood your data distribution. Are all 1, 2 and 3
docs
inserted into the same index? You might just need to have more machines
to
handle the load you are driving into ES.
I'm also using ES for logs, but I have no parent-child relationships.
So I can only say what helped me so far:
allocating ~half the amount of RAM to ES (min=max)
disable _all
compress _source
use one index per day (you can use another time unit that will fit
your needs better), and remove old indices when we want to discard
old
data. You can also optimize indices that you're finished with (eg:
the
one from yesterday)
increase the refresh interval
because we're always sorting logs by date (so we don't need
analysis), we're doing filters instead of queries.
I'm not sure what optimizations can be done with parent-child
documents, and what the overhead of such a structure actually is. So
maybe someone else can bring some light on this topic...
Our ES is getting slower and slower day by day. I have read about
different people using ES, and our usage is very less compared to
what
people have posted.
I am sure we are doing something wrong with our design. Could any
one
please suggest us some improvements.
We have the following data
Log data (5 million documents per day)
Our primary db data.
a) Type A - 100,000 documents (growth 5% per month)
b) Type B - 500,000 documents (growth 5% per month)
c) Type C - 30 Million Documents (growth 15% per month)
I would prefer to have these under one index
Metrics events (around 100 k documents per day)
On a big EC2 instance(2x large), we have created ES with 2
shards(both
on same machine)
We wanted to use parent child relation ship. So we created an
index,
and made logs, metrics, type B, type C documents as child's to type
A(accounts). Things were fine until we started loading log data.
But
now after we loaded around 200 Million of those logs, every thing
is
slow. Parent child queries even for types with less data take long
long time making it unusable for us.
The performance of individual queries also decreased a lot (still
usable).
Now we are struck. we are not sure if we need to make any changes
in
the design so that we can use parent child queries with improved
efficiency, or should we move our log type(or even more) in to a
different index so that things will be fast?
Any suggestions are greatly appreciated.
Thank you very much for your time reading my post.
Best regards,
Shashi
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.