Ultra-slow indexing

Alberto_Tostado · January 19, 2012, 1:13pm

Good morning.

I'm a ElasticSearch newbie user and we're trying to use it as a search
engine for a CouchDB database in a large information system for clinical
laboratories.
We are blocked because we need index a large amount of data and the
indexation is so slow (1 doc per minute... and increasing.....). See
table:

DocsIndexed docs por minute510...91,6666667833...64,61111...27,81222...22,2
1500...11,121572...4

I suppose we are making something wrong, but we don't have expertise to
know what is happening. We'd like to receive some guidelines to diagnose
and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in the
first days of the life of the doc. Later, the docs remain unmodified
(archived).
Before starting the system, we need index the historical documents (10
years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs preindexed and
searchable.

I attach a couple of sample document JSON. They are complex JSONs, but are
suitable for our needs.

We configure ElasticSearch and the CouchDb river as out of the box. Only
one instance, one computer running CouchDb and ElasticSearch side by side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT "http://127.0.0.1:9200/_river/hm/_meta" -d
"{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get the
same timing indexing by hand with curl:

%CURL% -XPOST "http://127.0.0.1:9200/hm/order/" -d @file.txt

Thank you.
Alberto Tostado.
Spain.

Berkay_Mollamustafao · January 19, 2012, 1:24pm

How much memory is assigned to Elasticsearch JVM ?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado atostado@gmail.com wrote:

Good morning.

I'm a Elasticsearch newbie user and we're trying to use it as a search
engine for a CouchDB database in a large information system for clinical
laboratories.
We are blocked because we need index a large amount of data and the
indexation is so slow (1 doc per minute... and increasing.....). See
table:

Docs Indexed docs por minute 510... 91,6666667 833... 64,6 1111... 27,8
1222... 22,2 1500... 11,12 1572... 4

I suppose we are making something wrong, but we don't have expertise to
know what is happening. We'd like to receive some guidelines to diagnose
and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in the
first days of the life of the doc. Later, the docs remain unmodified
(archived).
Before starting the system, we need index the historical documents (10
years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs preindexed and
searchable.

I attach a couple of sample document JSON. They are complex JSONs, but are
suitable for our needs.

We configure Elasticsearch and the CouchDb river as out of the box. Only
one instance, one computer running CouchDb and Elasticsearch side by side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT "http://127.0.0.1:9200/_river/hm/_meta" -d
"{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get the
same timing indexing by hand with curl:

%CURL% -XPOST "http://127.0.0.1:9200/hm/order/" -d @file.txt

Thank you.
Alberto Tostado.
Spain.

dadoonet · January 19, 2012, 1:46pm

Do you see "update mapping" in logs ?
If it's the case, create the right mapping before indexing as it seems that updating the mapping often has a real cost.

HTH
David
@dadoonet

Le 19 janv. 2012 à 14:24, Berkay Mollamustafaoglu mberkay@gmail.com a écrit :

How much memory is assigned to Elasticsearch JVM ?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado atostado@gmail.com wrote:
Good morning.

I'm a Elasticsearch newbie user and we're trying to use it as a search engine for a CouchDB database in a large information system for clinical laboratories.
We are blocked because we need index a large amount of data and the indexation is so slow (1 doc per minute... and increasing.....). See table:

Docs Indexed docs por minute
510... 91,6666667
833... 64,6
1111... 27,8
1222... 22,2
1500... 11,12
1572... 4

I suppose we are making something wrong, but we don't have expertise to know what is happening. We'd like to receive some guidelines to diagnose and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in the first days of the life of the doc. Later, the docs remain unmodified (archived).
Before starting the system, we need index the historical documents (10 years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs preindexed and searchable.

I attach a couple of sample document JSON. They are complex JSONs, but are suitable for our needs.

We configure Elasticsearch and the CouchDb river as out of the box. Only one instance, one computer running CouchDb and Elasticsearch side by side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT "http://127.0.0.1:9200/_river/hm/_meta" -d "{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get the same timing indexing by hand with curl:

%CURL% -XPOST "http://127.0.0.1:9200/hm/order/" -d @file.txt

Thank you.
Alberto Tostado.
Spain.

Alberto_Tostado · January 19, 2012, 3:59pm

About "memory" and "update_mapping"...

The server was configured as default 256m>>1g, but I've changed it
to 3g>>3g (following suggestions in documentation).
With the change the timings have started better (100 docs/min),
but again the rate decreases gradually (after 4000 docs, 35 docs/min).
Better than before, but...

Yes, I see a lot of "update_mapping (dynamic)" in the console/log. 3 or 4
every minute. Sorry for the question, but... Is a problem the
"update_mapping"? I'm afraid I'm not doing well.
I'm starting with Elasticsearch and I don't have clear concepts yet.

Also, I see in console (1 per minute +/-) the following:
Example:
[2012-01-19 16:49:11,766][INFO ][monitor.jvm ] [Eddie Brock]
[gc][ConcurrentMarkSweep][
270] took [6.7s]/[4.3s], reclaimed [671.6mb], leaving [1.6gb] used, max
[3.1gb]

Thank you.
Alberto Tostado
Spain

On Thu, Jan 19, 2012 at 2:46 PM, David Pilato david@pilato.fr wrote:

Do you see "update mapping" in logs ?
If it's the case, create the right mapping before indexing as it seems
that updating the mapping often has a real cost.

HTH
David
@dadoonet

Le 19 janv. 2012 à 14:24, Berkay Mollamustafaoglu mberkay@gmail.com a
écrit :

How much memory is assigned to Elasticsearch JVM ?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado < atostado@gmail.com
atostado@gmail.com> wrote:

Good morning.

I'm a Elasticsearch newbie user and we're trying to use it as a search
engine for a CouchDB database in a large information system for clinical
laboratories.
We are blocked because we need index a large amount of data and the
indexation is so slow (1 doc per minute... and increasing.....). See
table:

Docs Indexed docs por minute 510... 91,6666667 833... 64,6 1111... 27,8
1222... 22,2 1500... 11,12 1572... 4

I suppose we are making something wrong, but we don't have expertise to
know what is happening. We'd like to receive some guidelines to diagnose
and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in the
first days of the life of the doc. Later, the docs remain unmodified
(archived).
Before starting the system, we need index the historical documents (10
years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs preindexed
and searchable.

I attach a couple of sample document JSON. They are complex JSONs, but
are suitable for our needs.

We configure Elasticsearch and the CouchDb river as out of the box. Only
one instance, one computer running CouchDb and Elasticsearch side by side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT " http://127.0.0.1:9200/_river/hm/_meta
http://127.0.0.1:9200/_river/hm/_meta" -d
"{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get the
same timing indexing by hand with curl:

%CURL% -XPOST " http://127.0.0.1:9200/hm/order/
http://127.0.0.1:9200/hm/order/" -d @file.txt

Thank you.
Alberto Tostado.
Spain.

dadoonet · January 19, 2012, 4:37pm

That was the same for me when I started with ES.
The problem is that you send 1000 docs with one field.
Then, you send 1000 docs (same type) with a new field (so ES have to update
the mapping as there is a new field to manage).
Then, you send again 1000 docs with a new field. -> New update...

Imagine that when ES has to update the mapping, you continue sending
documents. That's a huge work.

Thats's why, I recommand to define the full mapping for your documents
before you start to inject docs.

How to do this ?
This is the way I do it.

I inject, let's say, 30000 docs in myindex/mytype and then I asked for the
mapping
curl -XGET http://localhost:9200/myindex/mytype/_mapping?pretty=true

Then, I drop mytype
curl -XDELETE http://localhost:9200/myindex/mytype

I create mytype with the mapping I just downloaded :
curl -XPUT http://localhost:9200/myindex/mytype/_mapping -d @$ESJSONFILE

And then, I reinject 50000 docs. It should run faster.
Then I asked again for the mapping (if it was updated) and do everything
again until I can not see anymore the update mapping message in log.

HTH
David.

Le 19 janvier 2012 à 16:59, Alberto Tostado atostado@gmail.com a écrit :

About "memory" and "update_mapping"...

The server was configured as default 256m>>1g, but I've changed it
to 3g>>3g (following suggestions in documentation).
With the change the timings have started better (100 docs/min),
but again the rate decreases gradually (after 4000 docs, 35 docs/min).
Better than before, but...

Yes, I see a lot of "update_mapping (dynamic)" in the console/log. 3 or 4
every minute. Sorry for the question, but... Is a problem the
"update_mapping"? I'm afraid I'm not doing well.
I'm starting with Elasticsearch and I don't have clear concepts yet.

Also, I see in console (1 per minute +/-) the following:
Example:
[2012-01-19 16:49:11,766][INFO ][monitor.jvm ] [Eddie Brock]
[gc][ConcurrentMarkSweep][
270] took [6.7s]/[4.3s], reclaimed [671.6mb], leaving [1.6gb] used, max
[3.1gb]

Thank you.
Alberto Tostado
Spain

On Thu, Jan 19, 2012 at 2:46 PM, David Pilato < david@pilato.fr
mailto:david@pilato.fr > wrote:

Do you see "update mapping" in logs ?
If it's the case, create the right mapping before indexing as it seems that
updating the mapping often has a real cost.

HTH
David
@dadoonet

Le 19 janv. 2012 à 14:24, Berkay Mollamustafaoglu < mberkay@gmail.com
mailto:mberkay@gmail.com > a écrit :

How much memory is assigned to Elasticsearch JVM ?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado < atostado@gmail.com
mailto:atostado@gmail.com > wrote:

Good morning.

I'm a Elasticsearch newbie user and we're trying to use it as a search
engine for a CouchDB database in a large information system for clinical
laboratories.
We are blocked because we need index a large amount of data and the
indexation is so slow (1 doc per minute... and increasing.....). See
table:

Docs Indexed docs por minute
510... 91,6666667
833... 64,6
1111... 27,8
1222... 22,2
1500... 11,12
1572... 4

I suppose we are making something wrong, but we don't have expertise to
know what is happening. We'd like to receive some guidelines to diagnose
and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in the
first days of the life of the doc. Later, the docs remain unmodified
(archived).
Before starting the system, we need index the historical documents (10
years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs preindexed
and
searchable.

I attach a couple of sample document JSON. They are complex JSONs, but
are
suitable for our needs.

We configure Elasticsearch and the CouchDb river as out of the box. Only
one instance, one computer running CouchDb and Elasticsearch side by
side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT " http://127.0.0.1:9200/_river/hm/_meta
http://127.0.0.1:9200/_river/hm/_meta " -d
"{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get
the
same timing indexing by hand with curl:

%CURL% -XPOST " http://127.0.0.1:9200/hm/order/
http://127.0.0.1:9200/hm/order/ " -d @file.txt

Thank you.
Alberto Tostado.
Spain.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet

kimchy · January 19, 2012, 7:36pm

update mapping happens when a document is indexed with new json fields, its
not that heavy, at least not one that explains the probelm you have.

Yes, increasing the memory will help. I wonder though. The logging you
pointed out is enabled only on old versions (sadly, that important logging
information no longer happens in newer versions because of a bug in the API
that uses to get that data). Which version are you using?

I suggest you use bigdesk plugin (check the plugins page on how to install
it: Elasticsearch Platform — Find real-time answers at scale | Elastic, and
make sure to use the latest ES version (it has many improvements, including
better memory control in the couchdb river).

On Thu, Jan 19, 2012 at 5:59 PM, Alberto Tostado atostado@gmail.com wrote:

About "memory" and "update_mapping"...

The server was configured as default 256m>>1g, but I've changed it
to 3g>>3g (following suggestions in documentation).
With the change the timings have started better (100 docs/min),
but again the rate decreases gradually (after 4000 docs, 35 docs/min).
Better than before, but...

Yes, I see a lot of "update_mapping (dynamic)" in the console/log. 3 or 4
every minute. Sorry for the question, but... Is a problem the
"update_mapping"? I'm afraid I'm not doing well.
I'm starting with Elasticsearch and I don't have clear concepts yet.

Also, I see in console (1 per minute +/-) the following:
Example:
[2012-01-19 16:49:11,766][INFO ][monitor.jvm ] [Eddie Brock]
[gc][ConcurrentMarkSweep][
270] took [6.7s]/[4.3s], reclaimed [671.6mb], leaving [1.6gb] used, max
[3.1gb]

Thank you.
Alberto Tostado
Spain

On Thu, Jan 19, 2012 at 2:46 PM, David Pilato david@pilato.fr wrote:

Do you see "update mapping" in logs ?
If it's the case, create the right mapping before indexing as it seems
that updating the mapping often has a real cost.

HTH
David
@dadoonet

Le 19 janv. 2012 à 14:24, Berkay Mollamustafaoglu mberkay@gmail.com a
écrit :

How much memory is assigned to Elasticsearch JVM ?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado < atostado@gmail.com
atostado@gmail.com> wrote:

Good morning.

I'm a Elasticsearch newbie user and we're trying to use it as a search
engine for a CouchDB database in a large information system for clinical
laboratories.
We are blocked because we need index a large amount of data and the
indexation is so slow (1 doc per minute... and increasing.....). See
table:

Docs Indexed docs por minute 510... 91,6666667 833... 64,6 1111... 27,8
1222... 22,2 1500... 11,12 1572... 4

I suppose we are making something wrong, but we don't have expertise to
know what is happening. We'd like to receive some guidelines to diagnose
and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in the
first days of the life of the doc. Later, the docs remain unmodified
(archived).
Before starting the system, we need index the historical documents (10
years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs preindexed
and searchable.

I attach a couple of sample document JSON. They are complex JSONs, but
are suitable for our needs.

We configure Elasticsearch and the CouchDb river as out of the box. Only
one instance, one computer running CouchDb and Elasticsearch side by side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT " http://127.0.0.1:9200/_river/hm/_meta
http://127.0.0.1:9200/_river/hm/_meta" -d
"{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get
the same timing indexing by hand with curl:

%CURL% -XPOST " http://127.0.0.1:9200/hm/order/
http://127.0.0.1:9200/hm/order/" -d @file.txt

Thank you.
Alberto Tostado.
Spain.

dadoonet · January 19, 2012, 7:45pm

Hi Shay,

Sorry to disagree but I got exactly the same issue :
At start indexing rate was 50 docs / sec.
Then it starts to decrease down to 1 doc / sec.

When I put a good mapping before indexing, indexing has a constant rate.

I did not change memory settings.

BTW it was some months ago and I think I was using 0.16.x or 0.15.x with default settings (only one node with 5 shards)

So I strongly think that updating the mapping very often while having a huge indexing load makes indexing less quick.

David
@dadoonet

Le 19 janv. 2012 à 20:36, Shay Banon kimchy@gmail.com a écrit :

update mapping happens when a document is indexed with new json fields, its not that heavy, at least not one that explains the probelm you have.

Yes, increasing the memory will help. I wonder though. The logging you pointed out is enabled only on old versions (sadly, that important logging information no longer happens in newer versions because of a bug in the API that uses to get that data). Which version are you using?

I suggest you use bigdesk plugin (check the plugins page on how to install it: Elasticsearch Platform — Find real-time answers at scale | Elastic, and make sure to use the latest ES version (it has many improvements, including better memory control in the couchdb river).

On Thu, Jan 19, 2012 at 5:59 PM, Alberto Tostado atostado@gmail.com wrote:
About "memory" and "update_mapping"...

The server was configured as default 256m>>1g, but I've changed it
to 3g>>3g (following suggestions in documentation).
With the change the timings have started better (100 docs/min),
but again the rate decreases gradually (after 4000 docs, 35 docs/min).
Better than before, but...

Yes, I see a lot of "update_mapping (dynamic)" in the console/log. 3 or 4
every minute. Sorry for the question, but... Is a problem the
"update_mapping"? I'm afraid I'm not doing well.
I'm starting with Elasticsearch and I don't have clear concepts yet.

Also, I see in console (1 per minute +/-) the following:
Example:
[2012-01-19 16:49:11,766][INFO ][monitor.jvm ] [Eddie Brock] [gc][ConcurrentMarkSweep][
270] took [6.7s]/[4.3s], reclaimed [671.6mb], leaving [1.6gb] used, max [3.1gb]

Thank you.
Alberto Tostado
Spain

On Thu, Jan 19, 2012 at 2:46 PM, David Pilato david@pilato.fr wrote:
Do you see "update mapping" in logs ?
If it's the case, create the right mapping before indexing as it seems that updating the mapping often has a real cost.

HTH
David
@dadoonet

Le 19 janv. 2012 à 14:24, Berkay Mollamustafaoglu mberkay@gmail.com a écrit :

How much memory is assigned to Elasticsearch JVM ?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado atostado@gmail.com wrote:
Good morning.

I'm a Elasticsearch newbie user and we're trying to use it as a search engine for a CouchDB database in a large information system for clinical laboratories.
We are blocked because we need index a large amount of data and the indexation is so slow (1 doc per minute... and increasing.....). See table:

Docs Indexed docs por minute
510... 91,6666667
833... 64,6
1111... 27,8
1222... 22,2
1500... 11,12
1572... 4

I suppose we are making something wrong, but we don't have expertise to know what is happening. We'd like to receive some guidelines to diagnose and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in the first days of the life of the doc. Later, the docs remain unmodified (archived).
Before starting the system, we need index the historical documents (10 years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs preindexed and searchable.

I attach a couple of sample document JSON. They are complex JSONs, but are suitable for our needs.

We configure Elasticsearch and the CouchDb river as out of the box. Only one instance, one computer running CouchDb and Elasticsearch side by side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT "http://127.0.0.1:9200/_river/hm/_meta" -d "{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get the same timing indexing by hand with curl:

%CURL% -XPOST "http://127.0.0.1:9200/hm/order/" -d @file.txt

Thank you.
Alberto Tostado.
Spain.

kimchy · January 19, 2012, 7:53pm

David, were you using the index API or bulk API? index API will, by
default, wait for the mapping to be applied (might wait talking to the
master of the cluster), but bulk will not (which is what the couchdriver
will do). Also, all this process is much faster in newer versions,
wondering if you would see the same behavior there.

On Thu, Jan 19, 2012 at 9:45 PM, David Pilato david@pilato.fr wrote:

Hi Shay,

Sorry to disagree but I got exactly the same issue :
At start indexing rate was 50 docs / sec.
Then it starts to decrease down to 1 doc / sec.

When I put a good mapping before indexing, indexing has a constant rate.

I did not change memory settings.

BTW it was some months ago and I think I was using 0.16.x or 0.15.x with
default settings (only one node with 5 shards)

So I strongly think that updating the mapping very often while having a
huge indexing load makes indexing less quick.

David
@dadoonet

Le 19 janv. 2012 à 20:36, Shay Banon kimchy@gmail.com a écrit :

update mapping happens when a document is indexed with new json fields,
its not that heavy, at least not one that explains the probelm you have.

Yes, increasing the memory will help. I wonder though. The logging you
pointed out is enabled only on old versions (sadly, that important logging
information no longer happens in newer versions because of a bug in the API
that uses to get that data). Which version are you using?

I suggest you use bigdesk plugin (check the plugins page on how to install
it: http://www.elasticsearch.org/guide/reference/modules/plugins.html
Elasticsearch Platform — Find real-time answers at scale | Elastic, and
make sure to use the latest ES version (it has many improvements, including
better memory control in the couchdb river).

On Thu, Jan 19, 2012 at 5:59 PM, Alberto Tostado < atostado@gmail.com
atostado@gmail.com> wrote:

About "memory" and "update_mapping"...

The server was configured as default 256m>>1g, but I've changed it
to 3g>>3g (following suggestions in documentation).
With the change the timings have started better (100 docs/min),
but again the rate decreases gradually (after 4000 docs, 35 docs/min).
Better than before, but...

Yes, I see a lot of "update_mapping (dynamic)" in the console/log. 3 or 4
every minute. Sorry for the question, but... Is a problem the
"update_mapping"? I'm afraid I'm not doing well.
I'm starting with Elasticsearch and I don't have clear concepts yet.

Also, I see in console (1 per minute +/-) the following:
Example:
[2012-01-19 16:49:11,766][INFO ][monitor.jvm ] [Eddie
Brock] [gc][ConcurrentMarkSweep][
270] took [6.7s]/[4.3s], reclaimed [671.6mb], leaving [1.6gb] used, max
[3.1gb]

Thank you.
Alberto Tostado
Spain

On Thu, Jan 19, 2012 at 2:46 PM, David Pilato < david@pilato.fr
david@pilato.fr> wrote:

Do you see "update mapping" in logs ?
If it's the case, create the right mapping before indexing as it seems
that updating the mapping often has a real cost.

HTH
David
@dadoonet

Le 19 janv. 2012 à 14:24, Berkay Mollamustafaoglu < mberkay@gmail.com
mberkay@gmail.com> a écrit :

How much memory is assigned to Elasticsearch JVM ?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado < atostado@gmail.com atostado@gmail.com
atostado@gmail.com> wrote:

Good morning.

I'm a Elasticsearch newbie user and we're trying to use it as a search
engine for a CouchDB database in a large information system for clinical
laboratories.
We are blocked because we need index a large amount of data and the
indexation is so slow (1 doc per minute... and increasing.....). See
table:

Docs Indexed docs por minute 510... 91,6666667 833... 64,6 1111...
27,8 1222... 22,2 1500... 11,12 1572... 4

I suppose we are making something wrong, but we don't have expertise to
know what is happening. We'd like to receive some guidelines to diagnose
and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in
the first days of the life of the doc. Later, the docs remain unmodified
(archived).
Before starting the system, we need index the historical documents (10
years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs preindexed
and searchable.

I attach a couple of sample document JSON. They are complex JSONs, but
are suitable for our needs.

We configure Elasticsearch and the CouchDb river as out of the box.
Only one instance, one computer running CouchDb and Elasticsearch side by
side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT " http://127.0.0.1:9200/_river/hm/_meta http://127.0.0.1:9200/_river/hm/_meta
http://127.0.0.1:9200/_river/hm/_meta" -d
"{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get
the same timing indexing by hand with curl:

%CURL% -XPOST " http://127.0.0.1:9200/hm/order/http://127.0.0.1:9200/hm/order/
http://127.0.0.1:9200/hm/order/" -d @file.txt

Thank you.
Alberto Tostado.
Spain.

dadoonet · January 19, 2012, 8:01pm

You're right. I was using index API and not bulk API.

David
@dadoonet

Le 19 janv. 2012 à 20:53, Shay Banon kimchy@gmail.com a écrit :

David, were you using the index API or bulk API? index API will, by default, wait for the mapping to be applied (might wait talking to the master of the cluster), but bulk will not (which is what the couchdriver will do). Also, all this process is much faster in newer versions, wondering if you would see the same behavior there.

On Thu, Jan 19, 2012 at 9:45 PM, David Pilato david@pilato.fr wrote:
Hi Shay,

Sorry to disagree but I got exactly the same issue :
At start indexing rate was 50 docs / sec.
Then it starts to decrease down to 1 doc / sec.

When I put a good mapping before indexing, indexing has a constant rate.

I did not change memory settings.

BTW it was some months ago and I think I was using 0.16.x or 0.15.x with default settings (only one node with 5 shards)

So I strongly think that updating the mapping very often while having a huge indexing load makes indexing less quick.

David
@dadoonet

Le 19 janv. 2012 à 20:36, Shay Banon kimchy@gmail.com a écrit :

update mapping happens when a document is indexed with new json fields, its not that heavy, at least not one that explains the probelm you have.

Yes, increasing the memory will help. I wonder though. The logging you pointed out is enabled only on old versions (sadly, that important logging information no longer happens in newer versions because of a bug in the API that uses to get that data). Which version are you using?

I suggest you use bigdesk plugin (check the plugins page on how to install it: Elasticsearch Platform — Find real-time answers at scale | Elastic, and make sure to use the latest ES version (it has many improvements, including better memory control in the couchdb river).

On Thu, Jan 19, 2012 at 5:59 PM, Alberto Tostado atostado@gmail.com wrote:
About "memory" and "update_mapping"...

The server was configured as default 256m>>1g, but I've changed it
to 3g>>3g (following suggestions in documentation).
With the change the timings have started better (100 docs/min),
but again the rate decreases gradually (after 4000 docs, 35 docs/min).
Better than before, but...

Yes, I see a lot of "update_mapping (dynamic)" in the console/log. 3 or 4
every minute. Sorry for the question, but... Is a problem the
"update_mapping"? I'm afraid I'm not doing well.
I'm starting with Elasticsearch and I don't have clear concepts yet.

Also, I see in console (1 per minute +/-) the following:
Example:
[2012-01-19 16:49:11,766][INFO ][monitor.jvm ] [Eddie Brock] [gc][ConcurrentMarkSweep][
270] took [6.7s]/[4.3s], reclaimed [671.6mb], leaving [1.6gb] used, max [3.1gb]

Thank you.
Alberto Tostado
Spain

On Thu, Jan 19, 2012 at 2:46 PM, David Pilato david@pilato.fr wrote:
Do you see "update mapping" in logs ?
If it's the case, create the right mapping before indexing as it seems that updating the mapping often has a real cost.

HTH
David
@dadoonet

Le 19 janv. 2012 à 14:24, Berkay Mollamustafaoglu mberkay@gmail.com a écrit :

How much memory is assigned to Elasticsearch JVM ?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado atostado@gmail.com wrote:
Good morning.

I'm a Elasticsearch newbie user and we're trying to use it as a search engine for a CouchDB database in a large information system for clinical laboratories.
We are blocked because we need index a large amount of data and the indexation is so slow (1 doc per minute... and increasing.....). See table:

Docs Indexed docs por minute
510... 91,6666667
833... 64,6
1111... 27,8
1222... 22,2
1500... 11,12
1572... 4

I suppose we are making something wrong, but we don't have expertise to know what is happening. We'd like to receive some guidelines to diagnose and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in the first days of the life of the doc. Later, the docs remain unmodified (archived).
Before starting the system, we need index the historical documents (10 years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs preindexed and searchable.

I attach a couple of sample document JSON. They are complex JSONs, but are suitable for our needs.

We configure Elasticsearch and the CouchDb river as out of the box. Only one instance, one computer running CouchDb and Elasticsearch side by side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT "http://127.0.0.1:9200/_river/hm/_meta" -d "{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get the same timing indexing by hand with curl:

%CURL% -XPOST "http://127.0.0.1:9200/hm/order/" -d @file.txt

Thank you.
Alberto Tostado.
Spain.

Alberto_Tostado · January 19, 2012, 8:37pm

Thank you for your comments.

The version could be outdated (0.16.2), but I tried a newer version (??)
a month ago and obtain the same sensation.
Tomorrow I'll try with a newer version (last?) and will apply the David
idea of update mapping. Also, will review bigdesk plugin.

Best regards.
Alberto
Spain

On Thu, Jan 19, 2012 at 9:01 PM, David Pilato david@pilato.fr wrote:

You're right. I was using index API and not bulk API.

David
@dadoonet

Le 19 janv. 2012 à 20:53, Shay Banon kimchy@gmail.com a écrit :

David, were you using the index API or bulk API? index API will, by
default, wait for the mapping to be applied (might wait talking to the
master of the cluster), but bulk will not (which is what the couchdriver
will do). Also, all this process is much faster in newer versions,
wondering if you would see the same behavior there.

On Thu, Jan 19, 2012 at 9:45 PM, David Pilato < david@pilato.fr
david@pilato.fr> wrote:

Hi Shay,

Sorry to disagree but I got exactly the same issue :
At start indexing rate was 50 docs / sec.
Then it starts to decrease down to 1 doc / sec.

When I put a good mapping before indexing, indexing has a constant rate.

I did not change memory settings.

BTW it was some months ago and I think I was using 0.16.x or 0.15.x with
default settings (only one node with 5 shards)

So I strongly think that updating the mapping very often while having a
huge indexing load makes indexing less quick.

David
@dadoonet

Le 19 janv. 2012 à 20:36, Shay Banon < kimchy@gmail.com kimchy@gmail.com>
a écrit :

update mapping happens when a document is indexed with new json fields,
its not that heavy, at least not one that explains the probelm you have.

Yes, increasing the memory will help. I wonder though. The logging you
pointed out is enabled only on old versions (sadly, that important logging
information no longer happens in newer versions because of a bug in the API
that uses to get that data). Which version are you using?

I suggest you use bigdesk plugin (check the plugins page on how to
install it: http://www.elasticsearch.org/guide/reference/modules/plugins.html http://www.elasticsearch.org/guide/reference/modules/plugins.html
Elasticsearch Platform — Find real-time answers at scale | Elastic, and
make sure to use the latest ES version (it has many improvements, including
better memory control in the couchdb river).

On Thu, Jan 19, 2012 at 5:59 PM, Alberto Tostado < atostado@gmail.com atostado@gmail.com
atostado@gmail.com> wrote:

About "memory" and "update_mapping"...

The server was configured as default 256m>>1g, but I've changed it
to 3g>>3g (following suggestions in documentation).
With the change the timings have started better (100 docs/min),
but again the rate decreases gradually (after 4000 docs, 35 docs/min).
Better than before, but...

Yes, I see a lot of "update_mapping (dynamic)" in the console/log. 3 or
4
every minute. Sorry for the question, but... Is a problem the
"update_mapping"? I'm afraid I'm not doing well.
I'm starting with Elasticsearch and I don't have clear concepts yet.

Also, I see in console (1 per minute +/-) the following:
Example:
[2012-01-19 16:49:11,766][INFO ][monitor.jvm ] [Eddie
Brock] [gc][ConcurrentMarkSweep][
270] took [6.7s]/[4.3s], reclaimed [671.6mb], leaving [1.6gb] used, max
[3.1gb]

Thank you.
Alberto Tostado
Spain

On Thu, Jan 19, 2012 at 2:46 PM, David Pilato < david@pilato.fr david@pilato.fr
david@pilato.fr> wrote:

Do you see "update mapping" in logs ?
If it's the case, create the right mapping before indexing as it seems
that updating the mapping often has a real cost.

HTH
David
@dadoonet

Le 19 janv. 2012 à 14:24, Berkay Mollamustafaoglu < mberkay@gmail.com mberkay@gmail.com
mberkay@gmail.com> a écrit :

How much memory is assigned to Elasticsearch JVM ?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado < atostado@gmail.com atostado@gmail.com atostado@gmail.com
atostado@gmail.com> wrote:

Good morning.

I'm a Elasticsearch newbie user and we're trying to use it as a search
engine for a CouchDB database in a large information system for clinical
laboratories.
We are blocked because we need index a large amount of data and the
indexation is so slow (1 doc per minute... and increasing.....). See
table:

Docs Indexed docs por minute 510... 91,6666667 833... 64,6 1111...
27,8 1222... 22,2 1500... 11,12 1572... 4

I suppose we are making something wrong, but we don't have expertise
to know what is happening. We'd like to receive some guidelines to diagnose
and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in
the first days of the life of the doc. Later, the docs remain unmodified
(archived).
Before starting the system, we need index the historical documents (10
years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs preindexed
and searchable.

I attach a couple of sample document JSON. They are complex JSONs, but
are suitable for our needs.

We configure Elasticsearch and the CouchDb river as out of the box.
Only one instance, one computer running CouchDb and Elasticsearch side by
side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT " http://127.0.0.1:9200/_river/hm/_meta http://127.0.0.1:9200/_river/hm/_meta http://127.0.0.1:9200/_river/hm/_meta
http://127.0.0.1:9200/_river/hm/_meta" -d
"{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get
the same timing indexing by hand with curl:

%CURL% -XPOST " http://127.0.0.1:9200/hm/order/http://127.0.0.1:9200/hm/order/http://127.0.0.1:9200/hm/order/
http://127.0.0.1:9200/hm/order/" -d @file.txt

Thank you.
Alberto Tostado.
Spain.

kayngee · November 9, 2012, 10:49am

Were you able to solve this problem?

On Thursday, 19 January 2012 21:37:35 UTC+1, Toastman wrote:

Thank you for your comments.

The version could be outdated (0.16.2), but I tried a newer version (??)
a month ago and obtain the same sensation.
Tomorrow I'll try with a newer version (last?) and will apply the David
idea of update mapping. Also, will review bigdesk plugin.

Best regards.
Alberto
Spain

On Thu, Jan 19, 2012 at 9:01 PM, David Pilato <da...@pilato.fr<javascript:>

wrote:

You're right. I was using index API and not bulk API.

David
@dadoonet

Le 19 janv. 2012 à 20:53, Shay Banon <kim...@gmail.com <javascript:>> a
écrit :

David, were you using the index API or bulk API? index API will, by
default, wait for the mapping to be applied (might wait talking to the
master of the cluster), but bulk will not (which is what the couchdriver
will do). Also, all this process is much faster in newer versions,
wondering if you would see the same behavior there.

On Thu, Jan 19, 2012 at 9:45 PM, David Pilato < <javascript:>
da...@pilato.fr <javascript:>> wrote:

Hi Shay,

Sorry to disagree but I got exactly the same issue :
At start indexing rate was 50 docs / sec.
Then it starts to decrease down to 1 doc / sec.

When I put a good mapping before indexing, indexing has a constant
rate.

I did not change memory settings.

BTW it was some months ago and I think I was using 0.16.x or 0.15.x with
default settings (only one node with 5 shards)

So I strongly think that updating the mapping very often while having a
huge indexing load makes indexing less quick.

David
@dadoonet

Le 19 janv. 2012 à 20:36, Shay Banon < <javascript:>kim...@gmail.com<javascript:>>
a écrit :

update mapping happens when a document is indexed with new json fields,
its not that heavy, at least not one that explains the probelm you have.

Yes, increasing the memory will help. I wonder though. The logging you
pointed out is enabled only on old versions (sadly, that important logging
information no longer happens in newer versions because of a bug in the API
that uses to get that data). Which version are you using?

I suggest you use bigdesk plugin (check the plugins page on how to
install it: http://www.elasticsearch.org/guide/reference/modules/plugins.html http://www.elasticsearch.org/guide/reference/modules/plugins.html
Elasticsearch Platform — Find real-time answers at scale | Elastic, and
make sure to use the latest ES version (it has many improvements, including
better memory control in the couchdb river).

On Thu, Jan 19, 2012 at 5:59 PM, Alberto Tostado < <javascript:><javascript:>
atos...@gmail.com <javascript:>> wrote:

About "memory" and "update_mapping"...

The server was configured as default 256m>>1g, but I've changed it
to 3g>>3g (following suggestions in documentation).
With the change the timings have started better (100 docs/min),
but again the rate decreases gradually (after 4000 docs, 35 docs/min).
Better than before, but...

Yes, I see a lot of "update_mapping (dynamic)" in the console/log. 3 or
4
every minute. Sorry for the question, but... Is a problem the
"update_mapping"? I'm afraid I'm not doing well.
I'm starting with Elasticsearch and I don't have clear concepts yet.

Also, I see in console (1 per minute +/-) the following:
Example:
[2012-01-19 16:49:11,766][INFO ][monitor.jvm ] [Eddie
Brock] [gc][ConcurrentMarkSweep][
270] took [6.7s]/[4.3s], reclaimed [671.6mb], leaving [1.6gb] used,
max [3.1gb]

Thank you.
Alberto Tostado
Spain

On Thu, Jan 19, 2012 at 2:46 PM, David Pilato < <javascript:><javascript:>
da...@pilato.fr <javascript:>> wrote:

Do you see "update mapping" in logs ?
If it's the case, create the right mapping before indexing as it seems
that updating the mapping often has a real cost.

HTH
David
@dadoonet

Le 19 janv. 2012 à 14:24, Berkay Mollamustafaoglu < <javascript:><javascript:>
mbe...@gmail.com <javascript:>> a écrit :

How much memory is assigned to Elasticsearch JVM ?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado < <javascript:><javascript:><javascript:>
atos...@gmail.com <javascript:>> wrote:

Good morning.

I'm a Elasticsearch newbie user and we're trying to use it as a
search engine for a CouchDB database in a large information system for
clinical laboratories.
We are blocked because we need index a large amount of data and the
indexation is so slow (1 doc per minute... and increasing.....). See
table:

Docs Indexed docs por minute 510... 91,6666667 833... 64,6 1111...
27,8 1222... 22,2 1500... 11,12 1572... 4

I suppose we are making something wrong, but we don't have expertise
to know what is happening. We'd like to receive some guidelines to diagnose
and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in
the first days of the life of the doc. Later, the docs remain unmodified
(archived).
Before starting the system, we need index the historical documents
(10 years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs
preindexed and searchable.

I attach a couple of sample document JSON. They are complex JSONs,
but are suitable for our needs.

We configure Elasticsearch and the CouchDb river as out of the box.
Only one instance, one computer running CouchDb and Elasticsearch side by
side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT " http://127.0.0.1:9200/_river/hm/_meta http://127.0.0.1:9200/_river/hm/_meta http://127.0.0.1:9200/_river/hm/_meta
http://127.0.0.1:9200/_river/hm/_meta" -d
"{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get
the same timing indexing by hand with curl:

%CURL% -XPOST " http://127.0.0.1:9200/hm/order/http://127.0.0.1:9200/hm/order/http://127.0.0.1:9200/hm/order/
http://127.0.0.1:9200/hm/order/" -d @file.txt

Thank you.
Alberto Tostado.
Spain.

--

kayngee · November 14, 2012, 11:54am

Did you manage to solve this problem?

On Thursday, 19 January 2012 21:37:35 UTC+1, Toastman wrote:

Thank you for your comments.

The version could be outdated (0.16.2), but I tried a newer version (??)
a month ago and obtain the same sensation.
Tomorrow I'll try with a newer version (last?) and will apply the David
idea of update mapping. Also, will review bigdesk plugin.

Best regards.
Alberto
Spain

On Thu, Jan 19, 2012 at 9:01 PM, David Pilato <da...@pilato.fr<javascript:>

wrote:

You're right. I was using index API and not bulk API.

David
@dadoonet

Le 19 janv. 2012 à 20:53, Shay Banon <kim...@gmail.com <javascript:>> a
écrit :

David, were you using the index API or bulk API? index API will, by
default, wait for the mapping to be applied (might wait talking to the
master of the cluster), but bulk will not (which is what the couchdriver
will do). Also, all this process is much faster in newer versions,
wondering if you would see the same behavior there.

On Thu, Jan 19, 2012 at 9:45 PM, David Pilato < <javascript:>
da...@pilato.fr <javascript:>> wrote:

Hi Shay,

Sorry to disagree but I got exactly the same issue :
At start indexing rate was 50 docs / sec.
Then it starts to decrease down to 1 doc / sec.

When I put a good mapping before indexing, indexing has a constant
rate.

I did not change memory settings.

BTW it was some months ago and I think I was using 0.16.x or 0.15.x with
default settings (only one node with 5 shards)

So I strongly think that updating the mapping very often while having a
huge indexing load makes indexing less quick.

David
@dadoonet

Le 19 janv. 2012 à 20:36, Shay Banon < <javascript:>kim...@gmail.com<javascript:>>
a écrit :

update mapping happens when a document is indexed with new json fields,
its not that heavy, at least not one that explains the probelm you have.

Yes, increasing the memory will help. I wonder though. The logging you
pointed out is enabled only on old versions (sadly, that important logging
information no longer happens in newer versions because of a bug in the API
that uses to get that data). Which version are you using?

I suggest you use bigdesk plugin (check the plugins page on how to
install it: http://www.elasticsearch.org/guide/reference/modules/plugins.html http://www.elasticsearch.org/guide/reference/modules/plugins.html
Elasticsearch Platform — Find real-time answers at scale | Elastic, and
make sure to use the latest ES version (it has many improvements, including
better memory control in the couchdb river).

On Thu, Jan 19, 2012 at 5:59 PM, Alberto Tostado < <javascript:><javascript:>
atos...@gmail.com <javascript:>> wrote:

About "memory" and "update_mapping"...

The server was configured as default 256m>>1g, but I've changed it
to 3g>>3g (following suggestions in documentation).
With the change the timings have started better (100 docs/min),
but again the rate decreases gradually (after 4000 docs, 35 docs/min).
Better than before, but...

Yes, I see a lot of "update_mapping (dynamic)" in the console/log. 3 or
4
every minute. Sorry for the question, but... Is a problem the
"update_mapping"? I'm afraid I'm not doing well.
I'm starting with Elasticsearch and I don't have clear concepts yet.

Also, I see in console (1 per minute +/-) the following:
Example:
[2012-01-19 16:49:11,766][INFO ][monitor.jvm ] [Eddie
Brock] [gc][ConcurrentMarkSweep][
270] took [6.7s]/[4.3s], reclaimed [671.6mb], leaving [1.6gb] used,
max [3.1gb]

Thank you.
Alberto Tostado
Spain

On Thu, Jan 19, 2012 at 2:46 PM, David Pilato < <javascript:><javascript:>
da...@pilato.fr <javascript:>> wrote:

Do you see "update mapping" in logs ?
If it's the case, create the right mapping before indexing as it seems
that updating the mapping often has a real cost.

HTH
David
@dadoonet

Le 19 janv. 2012 à 14:24, Berkay Mollamustafaoglu < <javascript:><javascript:>
mbe...@gmail.com <javascript:>> a écrit :

How much memory is assigned to Elasticsearch JVM ?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado < <javascript:><javascript:><javascript:>
atos...@gmail.com <javascript:>> wrote:

Good morning.

I'm a Elasticsearch newbie user and we're trying to use it as a
search engine for a CouchDB database in a large information system for
clinical laboratories.
We are blocked because we need index a large amount of data and the
indexation is so slow (1 doc per minute... and increasing.....). See
table:

Docs Indexed docs por minute 510... 91,6666667 833... 64,6 1111...
27,8 1222... 22,2 1500... 11,12 1572... 4

I suppose we are making something wrong, but we don't have expertise
to know what is happening. We'd like to receive some guidelines to diagnose
and solve the problem.

Here the details.... (thank you for read).

Our system produces 1000-3000 new docs per day with 20/30 updates in
the first days of the life of the doc. Later, the docs remain unmodified
(archived).
Before starting the system, we need index the historical documents
(10 years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs
preindexed and searchable.

I attach a couple of sample document JSON. They are complex JSONs,
but are suitable for our needs.

We configure Elasticsearch and the CouchDb river as out of the box.
Only one instance, one computer running CouchDb and Elasticsearch side by
side.

RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7

The river is started with this command:

%CURL% -XPUT " http://127.0.0.1:9200/_river/hm/_meta http://127.0.0.1:9200/_river/hm/_meta http://127.0.0.1:9200/_river/hm/_meta
http://127.0.0.1:9200/_river/hm/_meta" -d
"{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"

The indexation is made automagically by the CouchDb river, but we get
the same timing indexing by hand with curl:

%CURL% -XPOST " http://127.0.0.1:9200/hm/order/http://127.0.0.1:9200/hm/order/http://127.0.0.1:9200/hm/order/
http://127.0.0.1:9200/hm/order/" -d @file.txt

Thank you.
Alberto Tostado.
Spain.

--

Topic		Replies	Views
Couchdb river index performance slows down after a few hours Elasticsearch	1	303	July 6, 2017
Slow to index Elasticsearch	13	362	July 6, 2017
Really slow indexing from couchdb Elasticsearch	1	425	July 6, 2017
7 seconds to index document once i get close to 2million documents Elasticsearch	4	752	April 1, 2018
Document Processing Elasticsearch	3	789	July 6, 2017

Ultra-slow indexing

Related topics