Like the title states.
I am seeing indexing time on certain writes jump to close 30 seconds. I have configure all my index's refresh rate to 30 seconds.
Is this normal?
I have also bump machine by doubling the CPU from r5 2xlarge to m5 4xlarge (aws). I am still seeing long indexing time. CPU doesn't seem to spike when this happens. Maybe it's too short for me to capture. But at least high cpu doesn't show up on our stats graph.
From the slow indexing log on the machine.
I guess you sort of answered the question. If increasing refresh rate is simply delaying search ability, then the log suggests I am experiencing occasional long indexing episodes, which is not good when it gets to tens of seconds.
Is my assumption correct to worry about multi-second indexing time?
I know the recommendation is to have no more than 64GB of RAM (assign < 32GB to ES), but when it comes to bumping instance, will more RAM (beyond 64GB) or CPU be better overall?
I have asked this question before but nobody seems to be able to answer it. So I vote for the route of more CPU, which doesn't seem to help in indexing time. We are using ES in write heavy application (about 90% write).
If I give more RAM to OS (98GB) and keep ES at say 30GB, shouldn't the OS be able to utilize the extra RAM for file caching? Which in term should improve overall performance?
EBS SSD volume type. Bumping up instance from 2xlarge -> 4xlarge already increase the IO speed.
Should indexing time be very linear based on resource? Is occasional spike expected?
(I assume indexing time is independent of bulk write size since the each insert has it's own time measurement)
Thanks. I understand direct storage will be faster. We've weight the pro/con and decided on EBS.
From your experience, do most people use direct storage or EBS? This is something we can revisit if the trend is moving toward direct storage.
The obvious downside of direct storage is loss of data when an instance needs to be moved for whatever reason. The time it takes to reproduce data is too costly IMO. It would probably take few days in our setup. It also limits the option to bump instance. The only practical scaling is to add more nodes, etc.
There are lots of trade offs and design considerations (HW Profile, Elasticsearch Configuration, Information Architecture, ingest and Search Design etc) for meeting your Business Requirements and SLAs, Operational Efficiencies and NFRs.
Whether EBS works for you is a combination of all those above, but as @Christian_Dahlqvist said above Indexing (and Elasticsearch in general) is I/O intensive after all that is what is for data storage and searching, and your storage choice is a critical part of the Equation.
If EBS works for your requirements that is a excellent cost effective choice, if not you may need to reconsider. As a reference... none of our reference architectures of how we run 1000s of clusters in the cloud utilize EBS, do some customers use EBS. yes, but we often see them switch over to SSD as their requirements grow.
Perhaps take a look at these:
EDIT : I just realized I am not really clear on your use cases, we have architecture that use both SSD and HDD in a Hot / Warm Architecture that balances Performance and Cost perhaps that would be of interest
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.