I'm struggling with a strange ElasticSearch insert problem (tried both
0.10 and git master versions). On small ~800 chars documents the
insert takes 0.015s (http://tinypaste.com/6f186) but on bigger
~4000chars documents the insert time jumps to over 2 seconds: http://tinypaste.com/7422fb !
I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help
I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).
Its curl acting up. It takes about 2-8 milliseconds on my laptop to index
it using a different http client (I used the mac one called HTTPClient). How
do you plan to load the docs?
I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help
I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).
I just tried with HTTPClient and it does indeed seem to get my big
document through to ES fast! Thanks a million mate, i've been banging
my head against the wall all day today.
My original plan was to use curl php extension in a php script (turns
out it is as slow as the command line curl) and iterate through my
mongodb documents. I will now look into a different php client...
Its curl acting up. It takes about 2-8 milliseconds on my laptop to index
it using a different http client (I used the mac one called HTTPClient). How
do you plan to load the docs?
I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help
I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).
I just tried with HTTPClient and it does indeed seem to get my big
document through to ES fast! Thanks a million mate, i've been banging
my head against the wall all day today.
My original plan was to use curl php extension in a php script (turns
out it is as slow as the command line curl) and iterate through my
mongodb documents. I will now look into a different php client...
Its curl acting up. It takes about 2-8 milliseconds on my laptop to
index
it using a different http client (I used the mac one called HTTPClient).
How
do you plan to load the docs?
I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help
I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).
PHP client does indeed use curl, so i think i'll just rewrite the
import script to use raw socket (fsockopen()). This will cover my
needs so far.
Thrift on the other hand looks really promising, i'll look in to when
the system i am prototyping goes into production.
I just tried with HTTPClient and it does indeed seem to get my big
document through to ES fast! Thanks a million mate, i've been banging
my head against the wall all day today.
My original plan was to use curl php extension in a php script (turns
out it is as slow as the command line curl) and iterate through my
mongodb documents. I will now look into a different php client...
Its curl acting up. It takes about 2-8 milliseconds on my laptop to
index
it using a different http client (I used the mac one called HTTPClient).
How
do you plan to load the docs?
I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help
I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).
libcurl makes all POST and PUT requests (except for POST requests with a
very tiny request body) use the "Expect: 100-continue" header. This header
allows the server to deny the operation early so that libcurl can bail out
already before having to send any data. This is useful in authentication
cases and others.
However, many servers don't implement the Expect: stuff properly and if the
server doesn't respond (positively) within 1 second libcurl will continue
and send off the data anyway.
You can disable libcurl's use of the Expect: header the same way you disable
any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0.
PHP client does indeed use curl, so i think i'll just rewrite the
import script to use raw socket (fsockopen()). This will cover my
needs so far.
Thrift on the other hand looks really promising, i'll look in to when
the system i am prototyping goes into production.
I just tried with HTTPClient and it does indeed seem to get my big
document through to ES fast! Thanks a million mate, i've been banging
my head against the wall all day today.
My original plan was to use curl php extension in a php script (turns
out it is as slow as the command line curl) and iterate through my
mongodb documents. I will now look into a different php client...
Its curl acting up. It takes about 2-8 milliseconds on my laptop to
index
it using a different http client (I used the mac one called HTTPClient).
How
do you plan to load the docs?
I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help
I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).
Interesting, I will see if I can add support for that. Not using 1.0 means
that keep alive is problematic, and you really get that boost when the
client keeps reusing the same socket, though I am not sure its applicable in
PHP if it execs the curl...
-shay.banon
On Tue, Sep 28, 2010 at 5:20 PM, Ludovic Levesque luddic@gmail.com wrote:
Hi all,
we faced the same problem recently (indexing slow via curl, fast via
other library), and see this option: curl - Frequently Asked Questions
libcurl makes all POST and PUT requests (except for POST requests with a
very tiny request body) use the "Expect: 100-continue" header. This header
allows the server to deny the operation early so that libcurl can bail out
already before having to send any data. This is useful in authentication
cases and others.
However, many servers don't implement the Expect: stuff properly and if the
server doesn't respond (positively) within 1 second libcurl will continue
and send off the data anyway.
You can disable libcurl's use of the Expect: header the same way you
disable
any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP
1.0.
PHP client does indeed use curl, so i think i'll just rewrite the
import script to use raw socket (fsockopen()). This will cover my
needs so far.
Thrift on the other hand looks really promising, i'll look in to when
the system i am prototyping goes into production.
I just tried with HTTPClient and it does indeed seem to get my big
document through to ES fast! Thanks a million mate, i've been banging
my head against the wall all day today.
My original plan was to use curl php extension in a php script (turns
out it is as slow as the command line curl) and iterate through my
mongodb documents. I will now look into a different php client...
Its curl acting up. It takes about 2-8 milliseconds on my laptop
to
index
it using a different http client (I used the mac one called
HTTPClient).
How
do you plan to load the docs?
I have one local es instance and i've tried with different number
of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval
and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems
to
help
I hope you can point me in the right direction so i can index all
of
my 200000 documents (both small and big).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.