Slow to index

Harry_Waye · August 24, 2011, 2:46pm

Hi, I'm on elasticsearch master with two nodes, one index or 5 shards and 1
replica. I'm rivering in a couchdb database of about 3GB of 400000
documents. It's going very slowly, something like 50 documents a minute. I
can see on the rivering node that CPU usage is maxed out so its working
pretty hard.

The river and index are defined as per https://gist.github.com/1154061 and
node config is https://gist.github.com/1156885 .

How would I go about debugging this?

Regards,
Harry

kimchy · August 24, 2011, 5:37pm

Try maybe increasing the bulk timeout? So more documents will get into a
single bulk request. How complex are your docs?

On Wed, Aug 24, 2011 at 5:46 PM, Harry Waye hwaye@microwayes.net wrote:

Hi, I'm on elasticsearch master with two nodes, one index or 5 shards and 1
replica. I'm rivering in a couchdb database of about 3GB of 400000
documents. It's going very slowly, something like 50 documents a minute. I
can see on the rivering node that CPU usage is maxed out so its working
pretty hard.

The river and index are defined as per gist:1154061 · GitHub
and node config is gist:1156885 · GitHub .

How would I go about debugging this?

Regards,
Harry

Harry_Waye · August 25, 2011, 9:27am

I just tried increasing to 1s to no avail. The documents vary from about 1k
to 150k, with 100 distinct attributes overall, but each document only
including a small number of these, about 10. I'm going to try a 30s
timeout now to see it that makes a difference, and will play around with the
bulk_size. Are there any other settings that I can twiddle?

Clinton_Gormley · August 25, 2011, 10:18am

On Thu, 2011-08-25 at 02:27 -0700, Harry Waye wrote:

I just tried increasing to 1s to no avail. The documents vary from
about 1k to 150k, with 100 distinct attributes overall, but each
document only including a small number of these, about 10. I'm going
to try a 30s timeout now to see it that makes a difference, and will
play around with the bulk_size. Are there any other settings that I
can twiddle?

How much memory have you allocated to ES? And have you made sure that
swap is disabled, either by turning swapoff completely, or by using
mlockall?

Swap is your enemy - as soon as any part of the heap is in swap, the JVM
will grind to a halt.

clint

Harry_Waye · August 25, 2011, 10:27am

Min/max is set to 500/1000. I hadn't turned swap off so will give that a
try now...

Harry_Waye · August 25, 2011, 10:40am

No change, still very slow

kimchy · August 25, 2011, 2:29pm

Wondering, do you see heavy CPU load also on the other node?

On Thu, Aug 25, 2011 at 1:40 PM, Harry Waye hwaye@microwayes.net wrote:

No change, still very slow

Harry_Waye_2 · August 25, 2011, 2:35pm

No, just on the rivering node as I recall. I've disbanded the the group so
can't verify easily. I'll have to arrange a reunion later to test.

On 25 August 2011 15:29, Shay Banon kimchy@gmail.com wrote:

Wondering, do you see heavy CPU load also on the other node?

On Thu, Aug 25, 2011 at 1:40 PM, Harry Waye hwaye@microwayes.net wrote:

No change, still very slow

Harry_Waye_2 · August 25, 2011, 3:01pm

We've noticed that the river is pulling in _attachments as well, is it meant
to be doing that?

On 25 August 2011 15:35, Harry Waye harry@arachnys.com wrote:

No, just on the rivering node as I recall. I've disbanded the the group so
can't verify easily. I'll have to arrange a reunion later to test.

On 25 August 2011 15:29, Shay Banon kimchy@gmail.com wrote:

Wondering, do you see heavy CPU load also on the other node?

On Thu, Aug 25, 2011 at 1:40 PM, Harry Waye hwaye@microwayes.net wrote:

No change, still very slow

dadoonet · August 25, 2011, 3:17pm

Yes.
I will submit a pull request to disable it with a parameter next week.
By now, you have to add a script like ctx._doc.attachement=null

Btw, you will have to add the javascript plugin.

Hope this helps
David

Le 25 août 2011 à 17:01, Harry Waye harry@arachnys.com a écrit :

We've noticed that the river is pulling in _attachments as well, is it meant to be doing that?

On 25 August 2011 15:35, Harry Waye harry@arachnys.com wrote:
No, just on the rivering node as I recall. I've disbanded the the group so can't verify easily. I'll have to arrange a reunion later to test.

On 25 August 2011 15:29, Shay Banon kimchy@gmail.com wrote:
Wondering, do you see heavy CPU load also on the other node?

On Thu, Aug 25, 2011 at 1:40 PM, Harry Waye hwaye@microwayes.net wrote:
No change, still very slow

Harry_Waye_2 · August 25, 2011, 3:52pm

Thanks David

On 25 August 2011 16:17, David Pilato david@pilato.fr wrote:

Yes.
I will submit a pull request to disable it with a parameter next week.
By now, you have to add a script like ctx._doc.attachement=null

Btw, you will have to add the javascript plugin.

Hope this helps
David

Le 25 août 2011 à 17:01, Harry Waye harry@arachnys.com a écrit :

We've noticed that the river is pulling in _attachments as well, is it
meant to be doing that?

On 25 August 2011 15:35, Harry Waye < harry@arachnys.com
harry@arachnys.com> wrote:

No, just on the rivering node as I recall. I've disbanded the the group
so can't verify easily. I'll have to arrange a reunion later to test.

On 25 August 2011 15:29, Shay Banon < kimchy@gmail.com kimchy@gmail.com>wrote:

Wondering, do you see heavy CPU load also on the other node?

On Thu, Aug 25, 2011 at 1:40 PM, Harry Waye < hwaye@microwayes.net
hwaye@microwayes.net> wrote:

No change, still very slow

kimchy · August 26, 2011, 2:10pm

What does that mean, that it pulls attachments? Can we disable pulling
attachments on the _changes stream itself if one does not wish to have them?

On Thu, Aug 25, 2011 at 6:17 PM, David Pilato david@pilato.fr wrote:

Yes.
I will submit a pull request to disable it with a parameter next week.
By now, you have to add a script like ctx._doc.attachement=null

Btw, you will have to add the javascript plugin.

Hope this helps
David

Le 25 août 2011 à 17:01, Harry Waye harry@arachnys.com a écrit :

We've noticed that the river is pulling in _attachments as well, is it
meant to be doing that?

On 25 August 2011 15:35, Harry Waye < harry@arachnys.com
harry@arachnys.com> wrote:

No, just on the rivering node as I recall. I've disbanded the the group
so can't verify easily. I'll have to arrange a reunion later to test.

On 25 August 2011 15:29, Shay Banon < kimchy@gmail.com kimchy@gmail.com>wrote:

Wondering, do you see heavy CPU load also on the other node?

On Thu, Aug 25, 2011 at 1:40 PM, Harry Waye < hwaye@microwayes.net
hwaye@microwayes.net> wrote:

No change, still very slow

Harry_Waye_2 · August 26, 2011, 2:22pm

Not that it pulls in attachments, just attachment metadata list
content_type, length etc. Each attachment is assigned a hash so you end up
with many many fields, several for each attachment. I don't think you
can suppress the field, perhaps theres some was of using a view but we're
just removing it elasticsearch side for now.

On 26 August 2011 15:10, Shay Banon kimchy@gmail.com wrote:

What does that mean, that it pulls attachments? Can we disable pulling
attachments on the _changes stream itself if one does not wish to have them?

On Thu, Aug 25, 2011 at 6:17 PM, David Pilato david@pilato.fr wrote:

Yes.
I will submit a pull request to disable it with a parameter next week.
By now, you have to add a script like ctx._doc.attachement=null

Btw, you will have to add the javascript plugin.

Hope this helps
David

Le 25 août 2011 à 17:01, Harry Waye harry@arachnys.com a écrit :

We've noticed that the river is pulling in _attachments as well, is it
meant to be doing that?

On 25 August 2011 15:35, Harry Waye < harry@arachnys.com
harry@arachnys.com> wrote:

No, just on the rivering node as I recall. I've disbanded the the group
so can't verify easily. I'll have to arrange a reunion later to test.

On 25 August 2011 15:29, Shay Banon < kimchy@gmail.com kimchy@gmail.com

wrote:

Wondering, do you see heavy CPU load also on the other node?

On Thu, Aug 25, 2011 at 1:40 PM, Harry Waye < hwaye@microwayes.net
hwaye@microwayes.net> wrote:

No change, still very slow

Topic		Replies	Views
Couchdb river index performance slows down after a few hours Elasticsearch	1	303	July 6, 2017
Ultra-slow indexing Elasticsearch	12	810	July 6, 2017
Really slow indexing from couchdb Elasticsearch	1	427	July 6, 2017
Document Processing Elasticsearch	3	789	July 6, 2017
Index Dimensioning and Optimization (across the Cluster) Elasticsearch	6	378	March 24, 2021

Slow to index

Related topics