Optimize elastic 2.x for fast searches as much as possible, on a relatively small index - what are the best practices?

Jayme · January 13, 2016, 11:53am

Hi,
I'm using elastic 1.3.1 in production atm and want to migrate all the servers to 2.1.

Running 2.1 on a test server at the moment with SSD's and 24 cores, 64GB RAM.
The index running there is around +/- 80M rows, 54GB in size, and allocated 4 shards to it. It's not that big, but query search rate is very high and I want to optimize that speed as much as possible.

I'm not 100% confident of switching everything to 2.1 just yet because there are some slowdowns I'm experiencing at the moment when doing searches.

What I'm seeing now is that there are a lot of _segments generated by elastic and that this has a significant impact on search speed, around 24 segments per shard => 100 segments in total.

When I run _optimize?max_num_segments=1 searches go a lot faster after that

I want to keep the number of segments to the lowest as possible:
What are the best setting "merge" settings for the index?
(knowing that the size of the merges will be relatively low and the server is running on SSD)
Having the latest data is less crucial, so everything can be aggregated in a translog first and only merged after a while, as long as searches go fast enough.
I know that everything in ES 2.1 is now stored as "doc_values" by default, I'm a bit afraid that some of the speed benefits are lost with this approach,
Do you think using "fielddata" again for some field would be benefitial in terms of search speed?
The store that is used for elastic on linux is now niofs or mmapfs, would explictely defining "index.store.type: "mmapfs" be any beneficial in my case? (answer here was to leave these settings as is as much as possible - Mmapfs vs niofs?)
Lastly - do you see any improvements to the query posted below, searches that are done are mostly build up in that way.

Thanks a lot in advance!
Jayme

Segments of index:

gist.github.com

https://gist.github.com/jrots/79a4b883c9c4105e9ae4

gistfile1.txt

    http://elastic17:9200/users/_segments?pretty
    {
      "_shards" : {
        "total" : 4,
        "successful" : 4,
        "failed" : 0
      },
      "indices" : {
        "users" : {
          "shards" : {

This file has been truncated. show original

Settings of index:

gist.github.com

https://gist.github.com/jrots/9d027f3a00b7b7886a3b

gistfile1.txt

  http://elastic17:9200/users/_segments?pretty 
    {
      "users" : {
        "settings" : {
          "index" : {
            "search" : {
              "slowlog" : {
                "threshold" : {
                  "fetch" : {
                    "debug" : "500ms"

This file has been truncated. show original

Example query + time on subsequent runs:

gist.github.com

https://gist.github.com/jrots/1863d23a4210caed10f0

gistfile1.txt

curl -XGET "http://192.168.14.67:9200/users/person/_search" -d '
    {
        "from": 0,
        "size": 20,
        "_source": false,
        "query": {
            "constant_score": {
                "filter": {
                    "bool": {
                        "must": [

This file has been truncated. show original

{"took":1910,"timed_out":false,"_shards":{"total":4,"successful":4,"failed":0},"hits":{"total":9485
{"took":986,"timed_out":false,"_shards":{"total":4,"successful":4,"failed":0},"hits":{"total":9485,
{"took":704,"timed_out":false,"_shards":{"total":4,"successful":4,"failed":0},"hits":{"total":9483,

warkolm · January 14, 2016, 6:38am

You haven't really described what the problem is here, you just seem to be asking for recommendations on things to change without providing the full context of what's you think is not working and why.

We suggest not touching merge settings, most of the time it'll work against you.

Nope, that's why we moved to doc values

I believe mmapfs should be picked by default on most systems, even non-Windows.

Jayme · January 14, 2016, 7:43am

Well my main question is: what merge settings do I need to use to not have that many segments in my index as I see that searches are much slower compared to older versions of elastic, and when I run manually _optimize=max_segments=1 it all goes faster again.. I can put this in cronjob to run every hour but this should be adjustable by changing these merge settings to trigger faster, so which settings do I use?

jpountz · January 14, 2016, 9:03am

For the record I suspect that the main reason why you get better performance on 1-segment indices is that you are using several range queries in your query, which are terms-dictionary intensive queries. So the fewer segments you have, the fewer terms dictionaries the range queries need to resolve matching terms against.

Jayme · January 14, 2016, 9:21am

@jpountz, thanks! That's what I also saw is that when I removed the :

				    	{
    					    "range": {
    					        "lastlogindate": {
    					            "gte": "2015-01-13 00:00",
    					            "lte": "2016-01-13 10:00"
    					        }
    					    }
    					}

part in my query performance increased from avg 1000ms to sub 100ms.
so performance increase of times 10.

Also in 2.1 it states that only the 256 most active filters are cached,
what setting allows me to increase this/ are there insights in what these 256 filters are?

jpountz · January 14, 2016, 4:51pm

This can't be configured, but I don't think caching is the problem here, especially given that ranges are cached more aggressively due to their cost.

Jayme · January 14, 2016, 4:59pm

Ok - I've opened an issue for it. https://github.com/elastic/elasticsearch/issues/15994 - I can fix my queries to do the range queries in the post_filter but it's weird behaviour for sure.

Topic		Replies	Views
Elasticsearch 1.1.0 - Optimize broken? Elasticsearch	29	819	July 6, 2017
Changing Merge Policy And Optimization Elasticsearch	4	873	July 6, 2017
Reduce Number of Segments Elasticsearch	8	1418	July 6, 2017
Optimum value for "max_num_segments" Elasticsearch	5	838	July 6, 2017
Number of Segments Elasticsearch	2	1195	July 5, 2017

Optimize elastic 2.x for fast searches as much as possible, on a relatively small index - what are the best practices?

Thanks a lot in advance! Jayme

Related topics

Thanks a lot in advance!
Jayme