ElasticSearch V0.90 Faster Than V7.10.1

Hi everyone,

We are doing migration for ElasticSearch version 0.90 to version 7.10.1. I know this is really big upgrade for a component.

While this migration we realized, V7 queries are slower than V0 ones.

This slow performance can be observed at version 7 even for simple queries.

Simple query that performs better at V0:

  • v0 query:

    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "some_term_value": 1234
              }
            }
          ]
        }
      },
      "from": 0,
      "size": 10
    }
    
  • v7 query:

    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "some_term_value": 1234
              }
            }
          ]
        }
      },
      "from": 0,
      "size": 10
    }
    
  • some_term_value is mapped as long.

Machine specs;

  • Both: 64GB machine RAM, 24GB ElasticSearch JVM RAM
  • Both: has same CPU type and SSD disks.
  • Both: standalone machines.
  • Both: using JDK in the machine.
    • V0: 1.8.0_232
    • V7: 11.0.5

JVM options;

## JVM configuration

-Djava.io.tmpdir=...

-Xms24g
-Xmx24g

-Djava.net.preferIPv4Stack=true

## G1GC Configuration
11-:-XX:+UseG1GC
11-:-XX:G1ReservePercent=25
11-:-XX:InitiatingHeapOccupancyPercent=30

# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true

## GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m

ElasticSearch Specs:

  • v7 using G1GC for garbage collection.
    • Before this option we tried concurrent-mark-sweep but g1gc was way better for performance and cpu usage.
  • request_caching is disable. Cause of unnecessary option for our data actuality.
    • Our index is updated regulary by bulk inserts/deletes with interval of 500ms. This options is same for v0 and v7.
  • 900+ mapping property.

In our environment setups these queries;
- took for v0 ~20ms - ~30ms
- took for v7 ~40ms - ~50ms

Our most consuming queries as sample:

  • v0 query:

    {
        "from": "0",
        "size": "15",
        "fields": ["some_field"],
        "query": {
            "filtered": {
                "query": {
                    "match_all": {}
                },
                "filter": [
                	{
                        "terms": {
                            "some_term_value": [ 1234 ]
                        }
                    },
                    { ... }
                ]
            }
        },
        "filter": { ... },
        "sort": [ ... ]
    }
    
  • v7 query:

    {
        "from": "0",
        "size": "15",
        "_source": ["some_source"],
        "query": {
            "bool": {
                "must": [{
                    "match_all": {}
                }],
                "filter": [{
                        "terms": {
                            "some_term_value": [ 1234 ]
                        }
                    },
                    {
                        "bool": { ... }
                    }
                ]
            }
        },
        "aggs": { ... },
        "post_filter": {
            "bool": { ... }
        },
        "sort": [ ...
         ]
    }
    

My main question here, with these huge version change of the ElasticSearch that range of V7 and V0, why version 7 slower than version 0 ?

If that's the case, do you have any suggestion for improving our query performance?

Or we are doing some dirt in our setup.

Thanks for advice!
Have a Nice Day.

2 Likes

Longs, ints etc are assumed to be quantities - things you query in ranges e.g. pages with > 1m likes. The data structures used are optimised for range queries.

If your numbers are unique IDs which you query individually e.g. find productID = 7823672 then keyword fields are a better choice. That said, the performance differences between longs and keyword fields for ID queries don't tend to be noticeable until you're packing many ID queries into a single request.

Hi Mark,

Thanks for the quick reply.

We will absolutely try your suggestion on our long fields but our main question was about the time difference between v0 and v7 requests.

Also in V0 field mapping is long for that value.

Do you have any idea about why v7 would respond slower than v0 under mostly same circumstances?

So much has changed in both Lucene and elasticsearch between those versions it would be hard to pinpoint any single factor. Generally changes in performance are tracked by nightly benchmarks which have been running for a number of years and have helped trap several unexpected regressions.
Also, we now have query profiler which you can use to help spot where time is going.

I changed the mapping for keywords for proper values. Still version 0 performing better than version 7. Almost nothing changed at the query evaluation.

Only version 7 queries can be profiled sadly. For that reason, it is like comparing apples to oranges. If you can suggest any other method, i would lovely hear it.

Could you spot any anomaly at sample queries structures which i did post ?
After that all version upgrade many functions changed and of course adapted for new version.

V0 had no aggs?
Honestly you’d need to distill it down to a full JSON example and ideally minimised to the simplest request that shows clear differences.

V0 has aggregation as well. Sorry for misleading about that.
Here it is the queries which i am using.

  • v0
{
  "from": "0",
  "size": "15",
  "fields": ["id", "labels"],
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "and": [{
            "terms": {
              "t1": [
                "12345"
              ]
            }
          },
          {
            "or": [{
                "term": {
                  "t2": "12345"
                }
              },
              {
                "term": {
                  "t3": "45677"
                }
              }
            ]
          },
          {
            "range": {
              "t4": {
                "from": 1600000000,
                "include_lower": true,
                "to": 2000000000,
                "include_upper": true
              }
            }
          }
        ]
      }
    }
  },
  "filter": {
    "and": [{
        "range": {
          "g1": {
            "from": 0,
            "include_lower": true,
            "to": 10,
            "include_upper": true
          }
        }
      },
      {
        "terms": {
          "g2": [
            "a",
            "b",
            "c"
          ]
        }
      }
    ]
  },
  "facets": {
    "facet_1": {
      "terms": {
        "field": "g3",
        "size": "99"
      },
      "facet_filter": {
        "and": [{
            "range": {
              "g1": {
                "from": 0,
                "include_lower": true,
                "to": 10,
                "include_upper": true
              }
            }
          },
          {
            "terms": {
              "g2": [
                "a",
                "b",
                "c"
              ]
            }
          }
        ]
      }
    },
    "facet_2": {
      "terms": {
        "field": "g4",
        "size": "5"
      },
      "facet_filter": {
        "and": [{
            "range": {
              "g1": {
                "from": 0,
                "include_lower": true,
                "to": 10,
                "include_upper": true
              }
            }
          },
          {
            "terms": {
              "g2": [
                "a",
                "b",
                "c"
              ]
            }
          }
        ]
      }
    },
    "facet_3": {
      "terms": {
        "field": "g5",
        "size": "99"
      },
      "facet_filter": {
        "and": [{
          "range": {
            "f3": {
              "from": 0,
              "include_lower": true,
              "to": 5,
              "include_upper": true
            }
          }
        }]
      }
    }

  },
  "sort": [{
      "s1": "desc"
    },
    {
      "s2": "desc"
    }
  ]
}
  • v7
{
    "track_total_hits": true,
    "from": "0",
    "size": "15",
    "_source": ["id", "labels"],
    "query": {
        "bool": {
            "must": [{
                "match_all": {}
            }],
            "filter": [{
                    "terms": {
                        "t1": [
                            "12345"
                        ]
                    }
                },
                {
                    "bool": {
                        "should": [{
                            "bool": {
                                "should": [{
                                        "term": {
                                            "t2": "12345"
                                        }
                                    },
                                    {
                                        "term": {
                                            "t3": "45677"
                                        }
                                    }
                                ]
                            }
                        }]
                    }
                },
                {
                    "range": {
                        "t4": {
                            "from": 1600000000,
                            "include_lower": true,
                            "to": 2000000000,
                            "include_upper": true
                        }
                    }
                }

            ]
        }
    },
    "aggs": {
        "aggs_1": {
            "filter": {
                "bool": {
                    "must": [{
                            "range": {
                                "g1": {
                                    "from": 0,
                                    "include_lower": true,
                                    "to": 10,
                                    "include_upper": true
                                }
                            }
                        },
                        {
                            "terms": {
                                "g2": [
                                    "a",
                                    "b",
                                    "c"
                                ]
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "facet_1": {
                    "meta": {
                        "type": "terms"
                    },
                    "terms": {
                        "field": "g3",
                        "size": "99",
                        "order": [{
                            "_count": "desc"
                        }, {
                            "_key": "desc"
                        }]
                    }
                },
                "facet_2": {
                    "meta": {
                        "type": "terms"
                    },
                    "terms": {
                        "field": "g4",
                        "size": "5",
                        "order": [{
                            "_count": "desc"
                        }, {
                            "_key": "desc"
                        }]
                    }
                }
            }
        },
        "facet_3": {
            "meta": {
                "type": "terms"
            },
            "filter": {
                "bool": {
                    "must": [{
                        "range": {
                            "f3": {
                                "from": 0,
                                "include_lower": true,
                                "to": 5,
                                "include_upper": true
                            }
                        }
                    }]
                }
            },
            "aggs": {
                "_filtered_sub_aggs": {
                    "meta": {
                        "type": "terms"
                    },
                    "terms": {
                        "field": "g5",
                        "size": "99",
                        "order": [{
                            "_count": "desc"
                        }, {
                            "_key": "desc"
                        }]
                    }
                }
            }
        }
    },
    "post_filter": {
        "bool": {
            "must": [{
                    "range": {
                        "g1": {
                            "from": 0,
                            "include_lower": true,
                            "to": 10,
                            "include_upper": true
                        }
                    }
                },
                {
                    "terms": {
                        "g2": [
                            "a",
                            "b",
                            "c"
                        ]
                    }
                }
            ]
        }
    },
    "sort": [{
            "s1": "desc"
        },
        {
            "s2": "desc"
        }
    ]
}

The most interesting part at these queries;

  • Both queries processed same action as removing sorting, still v0 faster.
  • Both queries processed same action as removing aggregations, still v0 faster.