Hi,
I've been having extremely slow indexing performance via http (from a variety of clients). I've written a quick test case to demonstrate.
#!/usr/bin/env python
-- coding: utf-8 --
import json,httplib,glob,time
i=0
for file in glob.glob("*.json"):
products=json.loads(open(file,"r").read())
starttime=time.time()
for product in products:
conn=httplib.HTTPConnection('localhost:9200')
conn.request("POST","/product/%d"%i,json.dumps(product))
conn.close()
i+=1
if i%1000==0:
timetaken=time.time()-starttime
print "Indexed 1000 docs in %fs"%timetaken
starttime=time.time()
These are the sort of figures I am getting:-
Indexed 1000 docs in 18.856016s
Indexed 1000 docs in 15.699048s
Indexed 1000 docs in 18.557784s
Indexed 1000 docs in 3.327371s
Indexed 1000 docs in 15.613446s
Indexed 1000 docs in 15.774722s
Indexed 1000 docs in 24.579973s
Indexed 1000 docs in 33.737681s
Indexed 1000 docs in 27.596674s
Indexed 1000 docs in 27.645701s
SIGSEGV
A typical document will look like this
{"category": "mens shirts", "designer": "hollander & lexer", "name": "Marcus wool and cotton shirt", "url": "http://www.matchesfashion.com/fcp/product/Matches-Fashion/Shirts/hollander-%26-lexer-hol-y-marcus-woolcot-shirts-GREY/44578", "price": "235.00", "site_id": "44578", "image_urls": ["http://www.matchesfashion.com/pws/images/catalogue/products/hol-y-marcus-woolcot_gry/xlarge/hol-y-marcus-woolcot_gry_1.jpg", "http://www.matchesfashion.com/pws/images/catalogue/products/hol-y-marcus-woolcot_gry/small/hol-y-marcus-woolcot_gry_2.jpg", "http://www.matchesfashion.com/pws/images/catalogue/products/hol-y-marcus-woolcot_gry/small/hol-y-marcus-woolcot_gry_3.jpg", "http://www.matchesfashion.com/pws/images/catalogue/products/hol-y-marcus-woolcot_gry/small/hol-y-marcus-woolcot_gry_4.jpg", "http://www.matchesfashion.com/pws/images/catalogue/products/hol-y-marcus-woolcot_gry/small/hol-y-marcus-woolcot_gry_5.jpg"], "delivery_days": " 5 working days for your UK orders to be delivered and up to 7 working days for any International orders (customs dependent).", "unavailable_sizes": ["L"], "available_sizes": ["M", "XL"], "currency_code": "GBP", "description": "Grey round neck shirt. Buttoned down the front. Long sleeves. Buttoned cuffs. Creased effect shirt. Shoulder to hem measures 28in/71cm. The mannequin is 6ft 1in and wearing a medium. 85% cotton, 15% wool. Dry clean."}
I'm running 0.12.1 on a quad core i5 with 8GB ram. Heap is 1G/6G.
java -version
java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.1) (6b20-1.9.1-1ubuntu3)
OpenJDK 64-Bit Server VM (build 17.0-b16, mixed mode).
CPU is not maxing out but I am seeing a segfault after a few thousand documents:-
A fatal error has been detected by the Java Runtime Environment:
SIGSEGV (0xb) at pc=0x00007f448eb7765a, pid=1185, tid=139925551892240
JRE version: 6.0_20-b20
Java VM: OpenJDK 64-Bit Server VM (17.0-b16 mixed mode linux-amd64 )
Derivative: IcedTea6 1.9.1
Distribution: Ubuntu 10.10, package 6b20-1.9.1-1ubuntu3
Problematic frame:
V [libjvm.so+0x2ea65a]
An error report file with more information is saved as:
/home/paul/search/elasticsearch/elasticsearch-0.12.1/hs_err_pid1185.log
If you would like to submit a bug report, please include
instructions how to reproduce the bug and visit:
https://bugs.launchpad.net/ubuntu/+source/openjdk-6/
Aborted
Am I missing something?
Thanks,
Paul.