Does increase refresh_rate effect indexing time?

Like the title states.
I am seeing indexing time on certain writes jump to close 30 seconds. I have configure all my index's refresh rate to 30 seconds.
Is this normal?

I have also bump machine by doubling the CPU from r5 2xlarge to m5 4xlarge (aws). I am still seeing long indexing time. CPU doesn't seem to spike when this happens. Maybe it's too short for me to capture. But at least high cpu doesn't show up on our stats graph.

How do you measure indexing time? The refresh interval determines how frequently data that has been indexed is made searchable.

From the slow indexing log on the machine.
I guess you sort of answered the question. If increasing refresh rate is simply delaying search ability, then the log suggests I am experiencing occasional long indexing episodes, which is not good when it gets to tens of seconds.

Is my assumption correct to worry about multi-second indexing time?

I know the recommendation is to have no more than 64GB of RAM (assign < 32GB to ES), but when it comes to bumping instance, will more RAM (beyond 64GB) or CPU be better overall?
I have asked this question before but nobody seems to be able to answer it. So I vote for the route of more CPU, which doesn't seem to help in indexing time. We are using ES in write heavy application (about 90% write).

If I give more RAM to OS (98GB) and keep ES at say 30GB, shouldn't the OS be able to utilize the extra RAM for file caching? Which in term should improve overall performance?

Any insight would be appreciated.

Indexing is very I/O intensive so what type of disk/storage you have matter as much as CPU and RAM resources.

EBS SSD volume type. Bumping up instance from 2xlarge -> 4xlarge already increase the IO speed.
Should indexing time be very linear based on resource? Is occasional spike expected?
(I assume indexing time is independent of bulk write size since the each insert has it's own time measurement)

What you are probably seeing is more threads available to write so that is increasing throughput.

EBS SSD will not be as fast as i3 NVMe SSD direct attach... In general EBS is not the most
Optimum for high indexing rates.

Thanks. I understand direct storage will be faster. We've weight the pro/con and decided on EBS.
From your experience, do most people use direct storage or EBS? This is something we can revisit if the trend is moving toward direct storage.

The obvious downside of direct storage is loss of data when an instance needs to be moved for whatever reason. The time it takes to reproduce data is too costly IMO. It would probably take few days in our setup. It also limits the option to bump instance. The only practical scaling is to add more nodes, etc.

There are lots of trade offs and design considerations (HW Profile, Elasticsearch Configuration, Information Architecture, ingest and Search Design etc) for meeting your Business Requirements and SLAs, Operational Efficiencies and NFRs.

Whether EBS works for you is a combination of all those above, but as @Christian_Dahlqvist said above Indexing (and Elasticsearch in general) is I/O intensive after all that is what is for data storage and searching, and your storage choice is a critical part of the Equation.

If EBS works for your requirements that is a excellent cost effective choice, if not you may need to reconsider. As a reference... none of our reference architectures of how we run 1000s of clusters in the cloud utilize EBS, do some customers use EBS. yes, but we often see them switch over to SSD as their requirements grow.

Perhaps take a look at these:

EDIT : I just realized I am not really clear on your use cases, we have architecture that use both SSD and HDD in a Hot / Warm Architecture that balances Performance and Cost perhaps that would be of interest

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.