am using pyes... My script walks a dir tree looking for xml docs, for each
- parses it using a python lib (lxml.objectify)
- index it a json dump fo the object.
I did run my script commenting out the indexing step... which means just
walk the tree and parse the docs... it took 22 seconds!
I also notice, breaking my script after 1000 docs, that using 1, 2, 3, 4 or
5 nodes, does not change the total time much!!
My documents have half a dozen attributes, one of which is a decent size
I am using the default 5 shards and 1 replica.
am very very confused.
On Monday, October 8, 2012 3:23:15 PM UTC-4, David Pilato wrote:
Where are you loosing time? Is it when you get and build your docs or
when you send it?
How do you send it to ES? Are you using a bulk? Which size?
How does your documents look like?
It's best if you can provide more details about what you are doing. A
curl recreation is perfect.
a écrit :
I indexed 20K documents using a 5 node ES setup, (RHEL 6.x)
with everything in its default values. It took 15mins.
I then doubled the vCPUs on the VMs, from 4 to 8, and RAM from 4 to 8 GB.
Rerun the indexing which took 16mins!
I then installed the service wrapper on all nodes, and added these lines
at the top of the elasticsearch.conf:
Rerun my indexing and it took exactly 15mins again!!!
What am doing wrong? What is my bottleneck here?
Thanks a lot,
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs