Ahhh, thanks. We're already looking at upgrading (probably 6.8.3, that was current last I looked), in prep for moving to 7.x.
Question: Do we need 6.7.2 on all nodes, or just the coordinating node?
(just the coordinating doesn't seem to help. It will take me a few days to get the entire cluster upgraded.)
I think it'd be best to upgrade everything before drawing any conclusions.
Sorry for the delay. We finally got everything updated. (it's production, there were issues.)
We're seeing a minor improvement, but doubling our open indices still doubles our search time.
Any ideas?
Thanks for getting back to us.
My best idea for a next step is to start looking at the timings of the individual messages that make up a search using the transport tracer. This will help to clarify exactly where the time is going, although it is sometimes tricky to get a clean trace from a busy production system, and trace logging can slow things down a bit too so proceed with caution. Ideally you want to use a coordinating node that's not being used by anything else. I'm not going to be available for the next few days to give more detailed guidance, but will be back later next week if nobody else has stepped in.
Thanks. I'll take a look at that.
We do have a coordinating node (two, actually), which makes things a bit more consistent.
Hi @darkmoon,
I tried the following settings on a 6.8.3 cluster:
PUT /_cluster/settings
{
"transient": {
"transport.tracer.include": "indices:data/read/search*",
"logger.org.elasticsearch.transport.TransportService.tracer": "TRACE"
}
}
This enables logging for the transmission and receipt of every message involved in a search, like this:
[2019-10-09T08:39:31,242][TRACE][o.e.t.T.tracer ] [node-0] [12012][indices:data/read/search[phase/query]] sent to [{node-3}{unhUmLKIRMS55s6dBS2qPQ}{e09vwfZhTuWicQyWzeuk-Q}{127.0.0.1}{127.0.0.1:9303}{ml.machine_memory=17179869184, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}] (timeout: [null])
[2019-10-09T08:39:31,243][TRACE][o.e.t.T.tracer ] [node-3] [12012][indices:data/read/search[phase/query]] received request
[2019-10-09T08:39:31,244][TRACE][o.e.t.T.tracer ] [node-3] [12012][indices:data/read/search[phase/query]] sent response
[2019-10-09T08:39:31,244][TRACE][o.e.t.T.tracer ] [node-0] [12012][indices:data/read/search[phase/query]] received response from [{node-3}{unhUmLKIRMS55s6dBS2qPQ}{e09vwfZhTuWicQyWzeuk-Q}{127.0.0.1}{127.0.0.1:9303}{ml.machine_memory=17179869184, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]
Here node-0
is the coordinating node and node-3
is one of the data nodes. I'd like to look at the timings of these messages for a single problematic search. The best way to isolate a single search is to use a coordinating node that isn't being used for any other searches at the time. Ideally do this when there aren't too many other searches happening elsewhere either, because the data nodes will be logging messages about all ongoing searches. We can pick out the bits we need, but the less noise there is the better. Once the search has completed you can disable the tracer again:
PUT /_cluster/settings
{
"transient": {
"transport.tracer.include": null,
"logger.org.elasticsearch.transport.TransportService.tracer": null
}
}
Then could you share the log lines matching o.e.t.T.tracer
from that search, from the coordinating node and every one of the data nodes?
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.