I have some production indices that needs to a large amount of data
imported into them fairly frequently. Each time we import data the ES
nodes become a huge bottleneck. I honestly expected a lot better
performance out of them. Regardless, I would like to import data in a
production ES setup with the least amount of interruption or performance
issues.
What are some options I can take to import large quantities of data without
affecting data that is already being used by applications?
I was thinking I could use a combination of aliases or temp indices to
migrate the data over...
I have some production indices that needs to a large amount of data
imported into them fairly frequently. Each time we import data the ES
nodes become a huge bottleneck. I honestly expected a lot better
performance out of them. Regardless, I would like to import data in a
production ES setup with the least amount of interruption or performance
issues.
What are some options I can take to import large quantities of data
without affecting data that is already being used by applications?
I was thinking I could use a combination of aliases or temp indices to
migrate the data over...
Right now. 28 GB across two indices 5 shards 1 replica per index on 3 AWS
large servers.
Frequently 1-10 million records or more get imported. During this time all
ES nodes hit a CPU usage of over 75%. We want to break the index down and
add routing at some point.
Refresh is using default (1) and based on coupling to some old imports
system the bulk API is NOT used... Problem is the index get's accessed and
written to constantly by users. So disabling refresh would delay their
content from being indexed.
I was debating using a separate index per import and grouping all the
indices by an alias. Not certain how that will affect performance.
On Tuesday, January 27, 2015 at 8:03:16 PM UTC-5, Mark Walkom wrote:
How much data are you talking? Are you using bulk API? What is your bulk
sizing?
You can also set an index to not refresh while you ingest it (refresh =
-1), then once it's been sent to ES turn indexing back on.
I have some production indices that needs to a large amount of data
imported into them fairly frequently. Each time we import data the ES
nodes become a huge bottleneck. I honestly expected a lot better
performance out of them. Regardless, I would like to import data in a
production ES setup with the least amount of interruption or performance
issues.
What are some options I can take to import large quantities of data
without affecting data that is already being used by applications?
I was thinking I could use a combination of aliases or temp indices to
migrate the data over...
Right now. 28 GB across two indices 5 shards 1 replica per index on 3 AWS
large servers.
Frequently 1-10 million records or more get imported. During this time
all ES nodes hit a CPU usage of over 75%. We want to break the index down
and add routing at some point.
Refresh is using default (1) and based on coupling to some old imports
system the bulk API is NOT used... Problem is the index get's accessed and
written to constantly by users. So disabling refresh would delay their
content from being indexed.
I was debating using a separate index per import and grouping all the
indices by an alias. Not certain how that will affect performance.
On Tuesday, January 27, 2015 at 8:03:16 PM UTC-5, Mark Walkom wrote:
How much data are you talking? Are you using bulk API? What is your bulk
sizing?
You can also set an index to not refresh while you ingest it (refresh =
-1), then once it's been sent to ES turn indexing back on.
I have some production indices that needs to a large amount of data
imported into them fairly frequently. Each time we import data the ES
nodes become a huge bottleneck. I honestly expected a lot better
performance out of them. Regardless, I would like to import data in a
production ES setup with the least amount of interruption or performance
issues.
What are some options I can take to import large quantities of data
without affecting data that is already being used by applications?
I was thinking I could use a combination of aliases or temp indices to
migrate the data over...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.