Completion Suggester indexing speed issues

Hi all,

I've been testing out the new completion suggester in master, since it fits
a use case I have pretty well. I've been having some consistent problems
with slow indexing on my cluster, and have spent a while tweaking to try to
remedy them but am now led to the conclusion that it's likely the
completion suggester that's causing my slowdowns.

Here's my setup: I'm indexing approximately 2 billion docs, with two
completion fields, each of which is has both input and output length of <
60 characters.
We're using bulk indexing with various batch sizes with little difference
in the end results. The index has 0 replicas during indexing, 10 shards,
refresh_interval = -1, warmers disabled. We saw heavy merge volume, so we
increased segments_per_tier and index buffer size considerably. While we
did get some speed improvements, we're still seeing a large number of
queries in the indexing_slowlog.log

The cluster is 3 master nodes, 10 data nodes, each of which are quad core
and 16GB of memory. While running our bulk import scripts, we see the
servers remaining at low CPU load and low disk io in iostat. We're using
default garbage collector settings (+CondCardMark for Java 7) and we're not
seeing large stop the world collections, and our heap usage sits well below
the occupancy fraction. We're using the latest Java 7 release on Solaris 10
(Joyent's SmartOS).

In the slowlog, we're seeing a large number docs that are taking anywhere
from 500ms to 4s during indexing. A small sample from the slowlog
(non-sequential):

[2013-08-26 22:17:03,626][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[549ms], took_millis[549], type[words], id[2093528],
routing[], source[{"id":2093528,"lang_detect":"tension:en"}]
[2013-08-26 22:17:39,694][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[526.3ms], took_millis[526], type[words], id[2076947],
routing[],
source[{"id":2076947,"lang_detect":"Puesto:es","lang_completion":{"input":"es:en:Puesto:poner(to
put/ to place)","output":"es:en:Puesto:poner(to put/ to place)"}}]
[2013-08-26 22:19:43,308][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[784.7ms], took_millis[784], type[words], id[2162549],
routing[], source[{"id":2162549,"lang_detect":"the
teacher:en","lang_completion":{"input":"en:es:the teacher:la
maestra","output":"en:es:the teacher:la maestra"}}]

Our mapping for this type is as follows:

"words" : {
  "_all" : {
    "enabled" : false
  },
  "_source" : {
    "enabled" : false
  },
  "properties" : {
    "id" : {
      "type" : "long"
    },
    "lang_completion" : {
      "type" : "completion",
      "analyzer" : "foldedlowerkeyword",
      "payloads" : false,
      "preserve_separators" : true,
      "preserve_position_increments" : true
    },
    "lang_detect" : {
      "type" : "completion",
      "analyzer" : "foldedlowerkeyword",
      "payloads" : false,
      "preserve_separators" : true,
      "preserve_position_increments" : true
    }
  }

Analyzer to match that:

"index.analysis.analyzer.foldedlowerkeyword.filter.1" : "asciifolding",
"index.analysis.analyzer.foldedlowerkeyword.filter.0" : "lowercase",
"index.analysis.analyzer.foldedlowerkeyword.tokenizer" : "keyword",
"index.analysis.analyzer.foldedlowerkeyword.type" : "custom",

Any tips on where to look further to diagnose this would be appreciated. At this point we've tweaked enough that I'm wondering about the performance profile of the completion suggestor, since it is a new feature and hasn't seen too much testing. Is there something we can provide to help dig into exactly where we are having slowdowns?

Thanks,

--

Robert Deaton

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The only think I can think of here is that flushing takes a long time here
and your index-concurrency is too low for the bulk requests.
Maybe you can try to set:
index.index_concurrency = 16 (
Elasticsearch Platform — Find real-time answers at scale | Elastic)

The Completion stuff can be heavy on flush and merge so maybe hitting your
cluster with too many requests will eventually slow it down. Can you try
this setting and report back?

simon

On Tuesday, August 27, 2013 12:28:28 AM UTC+2, Robert Deaton wrote:

Hi all,

I've been testing out the new completion suggester in master, since it
fits a use case I have pretty well. I've been having some consistent
problems with slow indexing on my cluster, and have spent a while tweaking
to try to remedy them but am now led to the conclusion that it's likely the
completion suggester that's causing my slowdowns.

Here's my setup: I'm indexing approximately 2 billion docs, with two
completion fields, each of which is has both input and output length of <
60 characters.
We're using bulk indexing with various batch sizes with little difference
in the end results. The index has 0 replicas during indexing, 10 shards,
refresh_interval = -1, warmers disabled. We saw heavy merge volume, so we
increased segments_per_tier and index buffer size considerably. While we
did get some speed improvements, we're still seeing a large number of
queries in the indexing_slowlog.log

The cluster is 3 master nodes, 10 data nodes, each of which are quad core
and 16GB of memory. While running our bulk import scripts, we see the
servers remaining at low CPU load and low disk io in iostat. We're using
default garbage collector settings (+CondCardMark for Java 7) and we're not
seeing large stop the world collections, and our heap usage sits well below
the occupancy fraction. We're using the latest Java 7 release on Solaris 10
(Joyent's SmartOS).

In the slowlog, we're seeing a large number docs that are taking anywhere
from 500ms to 4s during indexing. A small sample from the slowlog
(non-sequential):

[2013-08-26 22:17:03,626][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[549ms], took_millis[549], type[words], id[2093528],
routing, source[{"id":2093528,"lang_detect":"tension:en"}]
[2013-08-26 22:17:39,694][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[526.3ms], took_millis[526], type[words], id[2076947],
routing,
source[{"id":2076947,"lang_detect":"Puesto:es","lang_completion":{"input":"es:en:Puesto:poner(to
put/ to place)","output":"es:en:Puesto:poner(to put/ to place)"}}]
[2013-08-26 22:19:43,308][TRACE][index.indexing.slowlog.index] [qup]
[quizlet][8] took[784.7ms], took_millis[784], type[words], id[2162549],
routing, source[{"id":2162549,"lang_detect":"the
teacher:en","lang_completion":{"input":"en:es:the teacher:la
maestra","output":"en:es:the teacher:la maestra"}}]

Our mapping for this type is as follows:

"words" : {
  "_all" : {
    "enabled" : false
  },
  "_source" : {
    "enabled" : false
  },
  "properties" : {
    "id" : {
      "type" : "long"
    },
    "lang_completion" : {
      "type" : "completion",
      "analyzer" : "foldedlowerkeyword",
      "payloads" : false,
      "preserve_separators" : true,
      "preserve_position_increments" : true
    },
    "lang_detect" : {
      "type" : "completion",
      "analyzer" : "foldedlowerkeyword",
      "payloads" : false,
      "preserve_separators" : true,
      "preserve_position_increments" : true
    }
  }

Analyzer to match that:

"index.analysis.analyzer.foldedlowerkeyword.filter.1" : "asciifolding",
"index.analysis.analyzer.foldedlowerkeyword.filter.0" : "lowercase",
"index.analysis.analyzer.foldedlowerkeyword.tokenizer" : "keyword",
"index.analysis.analyzer.foldedlowerkeyword.type" : "custom",

Any tips on where to look further to diagnose this would be appreciated. At this point we've tweaked enough that I'm wondering about the performance profile of the completion suggestor, since it is a new feature and hasn't seen too much testing. Is there something we can provide to help dig into exactly where we are having slowdowns?

Thanks,

--

Robert Deaton

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.