I did not change memory settings.
BTW it was some months ago and I think I was using 0.16.x or 0.15.x with default settings (only one node with 5 shards)
So I strongly think that updating the mapping very often while having a huge indexing load makes indexing less quick.
update mapping happens when a document is indexed with new json fields, its not that heavy, at least not one that explains the probelm you have.
Yes, increasing the memory will help. I wonder though. The logging you pointed out is enabled only on old versions (sadly, that important logging information no longer happens in newer versions because of a bug in the API that uses to get that data). Which version are you using?
I suggest you use bigdesk plugin (check the plugins page on how to install it: Elasticsearch Platform — Find real-time answers at scale | Elastic, and make sure to use the latest ES version (it has many improvements, including better memory control in the couchdb river).
On Thu, Jan 19, 2012 at 5:59 PM, Alberto Tostado atostado@gmail.com wrote:
About "memory" and "update_mapping"...
The server was configured as default 256m>>1g, but I've changed it
to 3g>>3g (following suggestions in documentation).
With the change the timings have started better (100 docs/min),
but again the rate decreases gradually (after 4000 docs, 35 docs/min).
Better than before, but...
Yes, I see a lot of "update_mapping (dynamic)" in the console/log. 3 or 4
every minute. Sorry for the question, but... Is a problem the
"update_mapping"? I'm afraid I'm not doing well.
I'm starting with Elasticsearch and I don't have clear concepts yet.
Also, I see in console (1 per minute +/-) the following:
Example:
[2012-01-19 16:49:11,766][INFO ][monitor.jvm ] [Eddie Brock] [gc][ConcurrentMarkSweep][
270] took [6.7s]/[4.3s], reclaimed [671.6mb], leaving [1.6gb] used, max [3.1gb]
Thank you.
Alberto Tostado
Spain
On Thu, Jan 19, 2012 at 2:46 PM, David Pilato david@pilato.fr wrote:
Do you see "update mapping" in logs ?
If it's the case, create the right mapping before indexing as it seems that updating the mapping often has a real cost.
HTH
David
@dadoonet
Le 19 janv. 2012 à 14:24, Berkay Mollamustafaoglu mberkay@gmail.com a écrit :
How much memory is assigned to Elasticsearch JVM ?
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
On Thu, Jan 19, 2012 at 8:13 AM, Alberto Tostado atostado@gmail.com wrote:
Good morning.
I'm a Elasticsearch newbie user and we're trying to use it as a search engine for a CouchDB database in a large information system for clinical laboratories.
We are blocked because we need index a large amount of data and the indexation is so slow (1 doc per minute... and increasing.....). See table:
Docs Indexed docs por minute
510... 91,6666667
833... 64,6
1111... 27,8
1222... 22,2
1500... 11,12
1572... 4
I suppose we are making something wrong, but we don't have expertise to know what is happening. We'd like to receive some guidelines to diagnose and solve the problem.
Here the details.... (thank you for read).
Our system produces 1000-3000 new docs per day with 20/30 updates in the first days of the life of the doc. Later, the docs remain unmodified (archived).
Before starting the system, we need index the historical documents (10 years)....
1500 docs * 30 days * 12 months * 10 years = 5.400.000 docs preindexed and searchable.
I attach a couple of sample document JSON. They are complex JSONs, but are suitable for our needs.
We configure Elasticsearch and the CouchDb river as out of the box. Only one instance, one computer running CouchDb and Elasticsearch side by side.
RAM: 6 GB
CPU: Xeon W3530 2.8 Ghz.
SO: Windows 7
The river is started with this command:
%CURL% -XPUT "http://127.0.0.1:9200/_river/hm/_meta" -d "{"type":"couchdb","couchdb":{"host":"127.0.0.1","port":5984,"db":"hm","filter":null}}"
The indexation is made automagically by the CouchDb river, but we get the same timing indexing by hand with curl:
%CURL% -XPOST "http://127.0.0.1:9200/hm/order/" -d @file.txt
Thank you.
Alberto Tostado.
Spain.