When to use mmapfs and niofs

I'm trying to benchmark search performance (Elasticsearch 6.0.1) on different FS mode.
Index size is around 5GB each (10 indexes in total).
Search is done by doing the following:
Open Index1 --> Search on Index1 --> Open Index2 --> Search on Index2 --> ... --> Open Index10 --> Search on Index10
The process above is repeated 5 times to get average across all search time.

On mmapfs it seems like first iteration is much more expensive and incurs a lot of Disk IO operations. Is this because mmapfs needs to build some sort of mapping on RAM?

On niofs, it seems stable for every iteration, and incurs less disk IO.

But I observe that search time on mmapfs is slightly faster than niofs.
and when system is performing heavy indexing, both mmapfs and niofs performs poorly on search.

Any rule of thumb on how to decide whether to choose mmapfs or niofs?

1 Like

Hi @dbakti7, it looks like you might be benchmarking with a cold cache (one of the seven deadly sins of benchmarking). Your test workload seems unrealistic too, unless closing and opening indices and searching them in sequence like this is how your production systems work. Can you clarify what you're trying to test?

Both niofs and mmapfs rely on the filesystem cache to be effective, but they interact with it differently so they do have different performance characteristics and we certainly expect them to warm up differently. In 6.7 the default is mmapfs and this is suitable for most workloads, so the "rule of thumb" is to use the default. You should only change it after benchmarking, because the best choice really depends on the details of the workload and is hard to predict.

The default in 7.0 is hybridfs which chooses between niofs and mmapfs a little more intelligently. This is also available in 6.7, but it's not the default (as that would have been a breaking change).

Hi @DavidTurner! Thanks for the speedy response.
To follow up:

  1. Please correct me if my understanding is wrong: cold cache is when index is recently open (or ES just reboot), hence it's expected for search to be slow on first few operations. After that cache will be built and subsequent operations will be faster. And it's recommended to benchmark on "warm cache" to get the stable measurement.

  2. Yes, due to some requirements, we have to perform Open-Close on indexes. That's why our benchmark have to simulate these scenario. The intention is to simulate worst case, i.e. first search immediately after OpenIndex. Multiple test run is just to get the average of this "worst case".

  3. From this test, it seems like even after closing the index, cache is somehow still maintained on memory (subsequent opens and searches on mmapfs are fast). Because of this, we also tried to run search while system is heavily indexing. Under this scenario, all searches are equally slow on both mmapfs and niofs (which I assume is because cache has to be evicted to make room for indexing operations).

  4. Related to previous point, upon heavy indexing, we found out:

    "buffer_pools": {"mapped":{"count":1264,"used_in_bytes":114462548303,"total_capacity_in_bytes":114462548303},"direct":{"count":141,"used_in_bytes":539763989,"total_capacity_in_bytes":539763988}},"classes":{"current_loaded_count":11601,"total_loaded_count":11820,"total_unloaded_count":219}}

114GB is much larger than our 32 GB RAM, may I know how internally mmapfs managed these? (via paging? or this number is not representative of the actual data loaded to RAM?)

  1. Regarding mmapfs or niofs, I'm trying to figure out whether they are influenced by index size. I read that mmapfs might not be the best option for big index. How big is considered as "big index"? This threshold approximation will be very helpful for our use case that has to Close index after some operation.

It's recommended to benchmark using a workload that's as close to your steady-state production workload as possible, to eliminate warm-up effects. It seems unlikely that a steady state production workload involves reboots.

Yes, closing an index doesn't drop anything from the filesystem cache.

It sounds like you did more indexing to try and flush some caches. This reasoning seems misguided. You should be trying to simulate your steady-state production workload accurately. The less accurate your simulation, the less meaningful are your data.

Yes, mmap interacts with the filesystem cache works via paging.

Where did you read this? As I said above, the choice depends on a lot of factors and can only really be determined with careful benchmarking.

I also note that if you're closing and reopening your indices in production then it's really easy to switch between niofs and mmapfs at runtime, so you can perform these experiments on a real system instead of this rather artificial simulation.

Agree on this, I wrongly understood that closing index and reboot will bring the same effect. Since you have clarified that closing an index doesn't drop anything from filesystem cache.

May I know why it's misguided? Unfortunately, it's possible on our system to do heavy indexing (+ multiple workers at the same time) which is very likely to exhaust memory and leads to cache evictions. And I think in this case it will require "warm-up" on subsequent search operations, resulting in slow first search.
Or is it advisable for us to redesign our indexing logic and avoid heavy indexing that might lead to cache evictions?

In this case, won't it be expensive if index size > RAM? (when performing search or segment merging, it might lead to many paging)
Or we can safely assume this has been optimized?

Based on this page seems like it's only an issue when the data size is on range of TBs, ignore my previous statement.

May I know how to do this? Based on documentation, store config only can be altered during index creation or from config file.

Why do you need to open and close indices?

Sorry, I will clarify. It isn't a good idea to add heavy indexing to your benchmark in order to evict caches, which is what you said was your reasoning. In contrast, it is a good idea to add heavy indexing to your benchmark if that better simulates your production workload, but that wasn't the reason you gave.

If your index doesn't entirely fit in RAM then you will sometimes need to read data from disk, and reading from disk is indeed more expensive than reading from cache. But I don't see what else one could do if the index doesn't entirely fit in RAM. Elasticsearch (really Lucene) tries its best to be sensitive to its effects on the filesystem cache, avoiding as much unnecessary paging as it can.

Yes, that note is about TB-scale indices with update-heavy workloads. That's not to say that this is the only case where one or other is better, just that we found a particular class of use cases where hybridfs was almost always a worthwhile improvement.

You can also set it on a closed index:

POST /i/_close
PUT /i/_settings
{
  "index.store.type": "mmapfs"
}
POST /i/_open
1 Like

@Christian_Dahlqvist It's due to some business requirements, unfortunately I can't reveal much information here. Sorry..

I see, sorry that I phrased my sentence wrongly.
But yes, the reason we simulate the heavy indexing + search is because this situation is possible on our production system, and we want to know the search performance.
Which based on our results, search seems to be heavily impacted because of the cache evictions.
Thanks for the clarifications, now we are in better position to adjust our indexing logic.

Agree with this statement.

This API works, thanks for the hidden gem!

I think now I have much better understanding on our problems and system limitations.
Thank you very much for the extensive explanations! :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.