Completion Suggester indexing speed issues

robert_3 · August 26, 2013, 10:28pm

Hi all,

I've been testing out the new completion suggester in master, since it fits
a use case I have pretty well. I've been having some consistent problems
with slow indexing on my cluster, and have spent a while tweaking to try to
remedy them but am now led to the conclusion that it's likely the
completion suggester that's causing my slowdowns.

Here's my setup: I'm indexing approximately 2 billion docs, with two
completion fields, each of which is has both input and output length of <
60 characters.
We're using bulk indexing with various batch sizes with little difference
in the end results. The index has 0 replicas during indexing, 10 shards,
refresh_interval = -1, warmers disabled. We saw heavy merge volume, so we
increased segments_per_tier and index buffer size considerably. While we
did get some speed improvements, we're still seeing a large number of
queries in the indexing_slowlog.log

The cluster is 3 master nodes, 10 data nodes, each of which are quad core
and 16GB of memory. While running our bulk import scripts, we see the
servers remaining at low CPU load and low disk io in iostat. We're using
default garbage collector settings (+CondCardMark for Java 7) and we're not
seeing large stop the world collections, and our heap usage sits well below
the occupancy fraction. We're using the latest Java 7 release on Solaris 10
(Joyent's SmartOS).

In the slowlog, we're seeing a large number docs that are taking anywhere
from 500ms to 4s during indexing. A small sample from the slowlog
(non-sequential):

[2013-08-26 22:17:03,626][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[549ms], took_millis[549], type[words], id[2093528],
routing[], source[{"id":2093528,"lang_detect":"tension:en"}]
[2013-08-26 22:17:39,694][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[526.3ms], took_millis[526], type[words], id[2076947],
routing[],
source[{"id":2076947,"lang_detect":"Puesto:es","lang_completion":{"input":"es:en:Puesto:poner(to
put/ to place)","output":"es:en:Puesto:poner(to put/ to place)"}}]
[2013-08-26 22:19:43,308][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[784.7ms], took_millis[784], type[words], id[2162549],
routing[], source[{"id":2162549,"lang_detect":"the
teacher:en","lang_completion":{"input":"en:es:the teacher:la
maestra","output":"en:es:the teacher:la maestra"}}]

Our mapping for this type is as follows:

"words" : {
  "_all" : {
    "enabled" : false
  },
  "_source" : {
    "enabled" : false
  },
  "properties" : {
    "id" : {
      "type" : "long"
    },
    "lang_completion" : {
      "type" : "completion",
      "analyzer" : "foldedlowerkeyword",
      "payloads" : false,
      "preserve_separators" : true,
      "preserve_position_increments" : true
    },
    "lang_detect" : {
      "type" : "completion",
      "analyzer" : "foldedlowerkeyword",
      "payloads" : false,
      "preserve_separators" : true,
      "preserve_position_increments" : true
    }
  }

Analyzer to match that:

"index.analysis.analyzer.foldedlowerkeyword.filter.1" : "asciifolding",
"index.analysis.analyzer.foldedlowerkeyword.filter.0" : "lowercase",
"index.analysis.analyzer.foldedlowerkeyword.tokenizer" : "keyword",
"index.analysis.analyzer.foldedlowerkeyword.type" : "custom",

Any tips on where to look further to diagnose this would be appreciated. At this point we've tweaked enough that I'm wondering about the performance profile of the completion suggestor, since it is a new feature and hasn't seen too much testing. Is there something we can provide to help dig into exactly where we are having slowdowns?

Thanks,

--

Robert Deaton

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · August 27, 2013, 10:37am

The only think I can think of here is that flushing takes a long time here
and your index-concurrency is too low for the bulk requests.
Maybe you can try to set:
index.index_concurrency = 16 (
Elasticsearch Platform — Find real-time answers at scale | Elastic)

The Completion stuff can be heavy on flush and merge so maybe hitting your
cluster with too many requests will eventually slow it down. Can you try
this setting and report back?

simon

On Tuesday, August 27, 2013 12:28:28 AM UTC+2, Robert Deaton wrote:

Hi all,

I've been testing out the new completion suggester in master, since it
fits a use case I have pretty well. I've been having some consistent
problems with slow indexing on my cluster, and have spent a while tweaking
to try to remedy them but am now led to the conclusion that it's likely the
completion suggester that's causing my slowdowns.

Here's my setup: I'm indexing approximately 2 billion docs, with two
completion fields, each of which is has both input and output length of <
60 characters.
We're using bulk indexing with various batch sizes with little difference
in the end results. The index has 0 replicas during indexing, 10 shards,
refresh_interval = -1, warmers disabled. We saw heavy merge volume, so we
increased segments_per_tier and index buffer size considerably. While we
did get some speed improvements, we're still seeing a large number of
queries in the indexing_slowlog.log

The cluster is 3 master nodes, 10 data nodes, each of which are quad core
and 16GB of memory. While running our bulk import scripts, we see the
servers remaining at low CPU load and low disk io in iostat. We're using
default garbage collector settings (+CondCardMark for Java 7) and we're not
seeing large stop the world collections, and our heap usage sits well below
the occupancy fraction. We're using the latest Java 7 release on Solaris 10
(Joyent's SmartOS).

In the slowlog, we're seeing a large number docs that are taking anywhere
from 500ms to 4s during indexing. A small sample from the slowlog
(non-sequential):

[2013-08-26 22:17:03,626][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[549ms], took_millis[549], type[words], id[2093528],
routing, source[{"id":2093528,"lang_detect":"tension:en"}]
[2013-08-26 22:17:39,694][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[526.3ms], took_millis[526], type[words], id[2076947],
routing,
source[{"id":2076947,"lang_detect":"Puesto:es","lang_completion":{"input":"es:en:Puesto:poner(to
put/ to place)","output":"es:en:Puesto:poner(to put/ to place)"}}]
[2013-08-26 22:19:43,308][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[784.7ms], took_millis[784], type[words], id[2162549],
routing, source[{"id":2162549,"lang_detect":"the
teacher:en","lang_completion":{"input":"en:es:the teacher:la
maestra","output":"en:es:the teacher:la maestra"}}]

Our mapping for this type is as follows:
"words" : {
  "_all" : {
    "enabled" : false
  },
  "_source" : {
    "enabled" : false
  },
  "properties" : {
    "id" : {
      "type" : "long"
    },
    "lang_completion" : {
      "type" : "completion",
      "analyzer" : "foldedlowerkeyword",
      "payloads" : false,
      "preserve_separators" : true,
      "preserve_position_increments" : true
    },
    "lang_detect" : {
      "type" : "completion",
      "analyzer" : "foldedlowerkeyword",
      "payloads" : false,
      "preserve_separators" : true,
      "preserve_position_increments" : true
    }
  }
Analyzer to match that:

"index.analysis.analyzer.foldedlowerkeyword.filter.1" : "asciifolding",
"index.analysis.analyzer.foldedlowerkeyword.filter.0" : "lowercase",
"index.analysis.analyzer.foldedlowerkeyword.tokenizer" : "keyword",
"index.analysis.analyzer.foldedlowerkeyword.type" : "custom",

Any tips on where to look further to diagnose this would be appreciated. At this point we've tweaked enough that I'm wondering about the performance profile of the completion suggestor, since it is a new feature and hasn't seen too much testing. Is there something we can provide to help dig into exactly where we are having slowdowns?

Thanks,

--

Robert Deaton

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Indexing slowing down aggregations a lot Elasticsearch	5	803	December 12, 2018
Index Dimensioning and Optimization (across the Cluster) Elasticsearch	6	376	March 24, 2021
Slow first request on an index after a short amount of time Elasticsearch	6	9903	March 13, 2020
ElasticSearch high CPU on merge threads Elasticsearch	8	2593	July 5, 2017
[Resolved] Requesting help for a initial indexing for over 30 Million Documents Elasticsearch	21	5237	July 5, 2017

Completion Suggester indexing speed issues

Related topics