Config/Tuning advice :)


(Paul McMahon) #1

Hi All,

I have an analytic app in development, and I was hoping to get a little advice around configuring elastic for query performance... Here is the scenario:

In all, there are around 42 million docs, in a single index. This number is essentially static; new data may be loaded into it once a year or something, but consider it a fixed data set. Indexing speed is unimportant; query speed is the primary goal...

The data is fairly evenly distributed as follows:

GroupId (an integer, mapped as a keyword) - there are about 3,800 unique values, with a pretty even distribution of docs between them. Every query starts by filtering on a single value.

GroupEx (an integer, mapped as a keyword) - there are anywhere from 10 up to about 50 or 60 of these per GroupId (average is around 30) Used sometimes in TERMS aggs, or for sorting; never for filtering.

SubGroup (an integer, mapped as a keyword) - there are 360 of these, repeated for each GroupEx. Every query filters on a range (maximum is 60). Some queries will sort on this, some queries will do TERMS aggs on it.

Each distinct combination of GroupId, GroupEx, SubGroupID is a single document.

Then, there is a nested structure (with a maximum of 24 objects per 'parent'), consisting of a couple of fields that are ints mapped as keywords, again that may be filtered on, sorted by, and used in TERMS aggs, and a half dozen 'value' fields, each of which is a half_float, and used in min,max, avg aggs, as well as some other statistical aggs.

I was running the app locally on my laptop with about a million docs, and it was very quick indeed. I've now put it on azure, with the full load of 42 million, and it is noticeably slower, with some queries timing out.

I've done the basic config, like setting the bootstrap.memory_lock etc and allocated half the physical ram to ES. In terms of indexing etc, I've just run the whole load of docs in, with no special settings at all; just the basic mapping on the index.

Just wondering (first) whether, given the access patterns above, I should be doing anything maybe with global ordinals, or the filesystem cache, or some other settings in the mappings or anywhere?

Also, it is currently running on a single 4-core 16gb Ubuntu vm (it is in demo mode so I am trying to keep costs low). Would I be better off with, say, 2 or three smaller vms?

Any ideas greatly appreciated :slight_smile:
Thanks!

Paul


(David Pilato) #2

May be you should add x-pack monitoring (available under basic license) and monitor what is happening?


(Daniel Mitterdorfer) #3

Hi,

first of all, it's always hard to pinpoint performance issues remotely so I can just give you some pointers.

42 million documents doesn't sound like a whole lot to me and the fact that this data set is rather static should also help. However, you should compare the performance under similar circumstances, i.e. both instances need to hold all 42 million documents. So my starting point would be to have comparable conditions. Everything else comes afterwards.

W.r.t. to tuning:

  • As you have a very static data set you could run a force merge after the import to reduce the number of segments.
  • You can play around with the number of shards.
  • You should test all that with repeatable experiments. You can use JMeter for example for that. We at Elastic use a tool called Rally for our macrobenchmarks; it's geared specifically towards Elasticsearch.
  • The system should have an SSD for best performance.
  • In general, queries benefit from replicas. So yes, it may make sense to use smaller nodes and define a replica count of 1 or 2. But you need to test whether this is beneficial in your case.
  • You can also use the query profiler to find out where time is spent in your queries.

Daniel


(Paul McMahon) #4

Thanks for the responses gents...

You are right, 40-odd million isn't much really, and the question was pretty generic!

I do now have the entire set of 40+mill docs running locally on a little 4-core NUC running Ubuntu - sweet little machine :slight_smile: Seems to perform better than my azure cluster, strangely!

What I have (in the azure setup) so far is a 3 node cluster, with 2 being data + master nodes, and the third being a coordinating only node; this third node is what the app connects to.

I also switched to a single shard per data node (i.e. 2 in total), and put custom routing in, so the same 'GroupId' always gets routed to the same shard. The queries the system generates are always for a single GroupId, so that should ensure one node/shard per query...

Then, I looked at the data types in use; I turned half_floats into scaled_floats, disabled _all, and disabled norms where I could.

Finally, I took a good look at the queries, from the ground up, and eliminated a certain degree of waste in the way they were formed; also, in the app logic, I realised a couple of bits of functionality could be served by a single query (instead of a query each), so I reduced the overall number of queries by about 20%.

Altogether, these changes did make a fair bit of difference - though it is still a bit slower that I had hoped for. One of the queries in particular can take up to 10 seconds to execute. It is, to be fair, doing some reasonably hefty aggs; there is a nested structure, and I need to do a terms agg (size = 60) with av, min, max etc at one level, then another terms agg (size = 100) that does min/max bucket etc as well as extended_stats_bucket over the 'inner' aggs. So it ends up with a max of 6,000 inner buckets altogether, plus the 100 outer.

Trouble is most likely in that first, 'inner' terms agg; but with the logic I need, I just can't think of any way to avoid it. Likewise, I have some further logic (which I have not yet implemented) which will probably necessitate keeping these structures as 'nested', rather than just flattening everything out into individual docs.

I thought next I would try making that third node a data + master-eligible, and reindexing with a third shard. I will then test with the current single replica, and again with 2 replicas. Although a co-ordinating node is recommended for offloading agg-related stuff, in my scenario the query phase consists of a few keyword filters and relatively few resulting documents (i.e. 6,000 max) per query, so I suspect it is probably not necessary....


(Daniel Mitterdorfer) #5

Hi,

That can have all sorts of causes. I'd start to look for hardware differences between your local machine and the Azure machine(s). You can also check with the usual suspects (iostat, vmstat, ...) whether you can find any bottlenecks. Also: At least on AWS the performance among instances (of the same type!) can vary significantly. I don't know whether this also happens on Azure.

One shard may be too little, just check out the section Designing for Scale in the definitive guide and also the following two talks by my colleague Christian Dahlqvist:

(both talks require prior registration)

W.r.t. optimizing the queries, it really may pay off to check the query profiler. Depending on your data access patterns it could make sense to transform the data (e.g. pre-aggregate them) but this is domain specific and you know best whether this makes sense for your data.

Daniel


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.