Slow inserts on bigger documents

Eugene_2 · September 28, 2010, 11:20am

Hey guys,

I'm struggling with a strange ElasticSearch insert problem (tried both
0.10 and git master versions). On small ~800 chars documents the
insert takes 0.015s (http://tinypaste.com/6f186) but on bigger
~4000chars documents the insert time jumps to over 2 seconds:
http://tinypaste.com/7422fb !

I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help

I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).

kimchy · September 28, 2010, 12:11pm

Hey,

Its curl acting up. It takes about 2-8 milliseconds on my laptop to index
it using a different http client (I used the mac one called HTTPClient). How
do you plan to load the docs?

-shay.banon

On Tue, Sep 28, 2010 at 1:20 PM, Eugene glum.ua@gmail.com wrote:

Hey guys,

I'm struggling with a strange Elasticsearch insert problem (tried both
0.10 and git master versions). On small ~800 chars documents the
insert takes 0.015s (6f186 - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000) but on bigger
~4000chars documents the insert time jumps to over 2 seconds:
7422fb - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000 !

I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help

I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).

Eugene_2 · September 28, 2010, 12:26pm

Whoa,

I just tried with HTTPClient and it does indeed seem to get my big
document through to ES fast! Thanks a million mate, i've been banging
my head against the wall all day today.

My original plan was to use curl php extension in a php script (turns
out it is as slow as the command line curl) and iterate through my
mongodb documents. I will now look into a different php client...

On Sep 28, 2:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

Its curl acting up. It takes about 2-8 milliseconds on my laptop to index
it using a different http client (I used the mac one called HTTPClient). How
do you plan to load the docs?

-shay.banon

On Tue, Sep 28, 2010 at 1:20 PM, Eugene glum...@gmail.com wrote:

Hey guys,

I'm struggling with a strange Elasticsearch insert problem (tried both
0.10 and git master versions). On small ~800 chars documents the
insert takes 0.015s (6f186 - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000) but on bigger
~4000chars documents the insert time jumps to over 2 seconds:
7422fb - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000!

I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help

I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).

kimchy · September 28, 2010, 12:38pm

Another option is the new REST Thrift client at master:
Issues · elastic/elasticsearch · GitHub. There
is a PHP client at works, maybe add it there?
GitHub - nervetattoo/elasticsearch: Simple PHP client for ElasticSearch (I am not sure if it uses curl
PHP extension)...

-shay.banon

On Tue, Sep 28, 2010 at 2:26 PM, Eugene glum.ua@gmail.com wrote:

Whoa,

I just tried with HTTPClient and it does indeed seem to get my big
document through to ES fast! Thanks a million mate, i've been banging
my head against the wall all day today.

My original plan was to use curl php extension in a php script (turns
out it is as slow as the command line curl) and iterate through my
mongodb documents. I will now look into a different php client...

On Sep 28, 2:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

Its curl acting up. It takes about 2-8 milliseconds on my laptop to
index
it using a different http client (I used the mac one called HTTPClient).
How
do you plan to load the docs?

-shay.banon

On Tue, Sep 28, 2010 at 1:20 PM, Eugene glum...@gmail.com wrote:

Hey guys,

I'm struggling with a strange Elasticsearch insert problem (tried both
0.10 and git master versions). On small ~800 chars documents the
insert takes 0.015s (6f186 - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000) but on bigger
~4000chars documents the insert time jumps to over 2 seconds:
7422fb - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000!

I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help

I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).

Eugene_2 · September 28, 2010, 1:20pm

PHP client does indeed use curl, so i think i'll just rewrite the
import script to use raw socket (fsockopen()). This will cover my
needs so far.
Thrift on the other hand looks really promising, i'll look in to when
the system i am prototyping goes into production.

-- Eugene

On Sep 28, 2:38 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Another option is the new REST Thrift client at master:Issues · elastic/elasticsearch · GitHub. There
is a PHP client at works, maybe add it there?GitHub - nervetattoo/elasticsearch: Simple PHP client for ElasticSearch(I am not sure if it uses curl
PHP extension)...

-shay.banon

On Tue, Sep 28, 2010 at 2:26 PM, Eugene glum...@gmail.com wrote:

Whoa,

I just tried with HTTPClient and it does indeed seem to get my big
document through to ES fast! Thanks a million mate, i've been banging
my head against the wall all day today.

My original plan was to use curl php extension in a php script (turns
out it is as slow as the command line curl) and iterate through my
mongodb documents. I will now look into a different php client...

On Sep 28, 2:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

Its curl acting up. It takes about 2-8 milliseconds on my laptop to
index
it using a different http client (I used the mac one called HTTPClient).
How
do you plan to load the docs?

-shay.banon

On Tue, Sep 28, 2010 at 1:20 PM, Eugene glum...@gmail.com wrote:

Hey guys,

I'm struggling with a strange Elasticsearch insert problem (tried both
0.10 and git master versions). On small ~800 chars documents the
insert takes 0.015s (6f186 - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000) but on bigger
~4000chars documents the insert time jumps to over 2 seconds:
7422fb - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000!

I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help

I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).

Ludovic_Levesque · September 28, 2010, 3:20pm

Hi all,

we faced the same problem recently (indexing slow via curl, fast via
other library), and see this option:
http://curl.haxx.se/docs/faq.html#My_HTTP_POST_or_PUT_requests_are

libcurl makes all POST and PUT requests (except for POST requests with a
very tiny request body) use the "Expect: 100-continue" header. This header
allows the server to deny the operation early so that libcurl can bail out
already before having to send any data. This is useful in authentication
cases and others.

However, many servers don't implement the Expect: stuff properly and if the
server doesn't respond (positively) within 1 second libcurl will continue
and send off the data anyway.

You can disable libcurl's use of the Expect: header the same way you disable
any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0.

Faster now with curl

Ludo

On Tue, Sep 28, 2010 at 3:20 PM, Eugene glum.ua@gmail.com wrote:

PHP client does indeed use curl, so i think i'll just rewrite the
import script to use raw socket (fsockopen()). This will cover my
needs so far.
Thrift on the other hand looks really promising, i'll look in to when
the system i am prototyping goes into production.

-- Eugene

On Sep 28, 2:38 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Another option is the new REST Thrift client at master:Issues · elastic/elasticsearch · GitHub. There
is a PHP client at works, maybe add it there?GitHub - nervetattoo/elasticsearch: Simple PHP client for ElasticSearch(I am not sure if it uses curl
PHP extension)...

-shay.banon

On Tue, Sep 28, 2010 at 2:26 PM, Eugene glum...@gmail.com wrote:

Whoa,

I just tried with HTTPClient and it does indeed seem to get my big
document through to ES fast! Thanks a million mate, i've been banging
my head against the wall all day today.

My original plan was to use curl php extension in a php script (turns
out it is as slow as the command line curl) and iterate through my
mongodb documents. I will now look into a different php client...

On Sep 28, 2:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

Its curl acting up. It takes about 2-8 milliseconds on my laptop to
index
it using a different http client (I used the mac one called HTTPClient).
How
do you plan to load the docs?

-shay.banon

On Tue, Sep 28, 2010 at 1:20 PM, Eugene glum...@gmail.com wrote:

Hey guys,

I'm struggling with a strange Elasticsearch insert problem (tried both
0.10 and git master versions). On small ~800 chars documents the
insert takes 0.015s (6f186 - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000) but on bigger
~4000chars documents the insert time jumps to over 2 seconds:
7422fb - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000!

I have one local es instance and i've tried with different number of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems to
help

I hope you can point me in the right direction so i can index all of
my 200000 documents (both small and big).

kimchy · September 28, 2010, 6:11pm

Hey,

Interesting, I will see if I can add support for that. Not using 1.0 means
that keep alive is problematic, and you really get that boost when the
client keeps reusing the same socket, though I am not sure its applicable in
PHP if it execs the curl...

-shay.banon

On Tue, Sep 28, 2010 at 5:20 PM, Ludovic Levesque luddic@gmail.com wrote:

Hi all,

we faced the same problem recently (indexing slow via curl, fast via
other library), and see this option:
curl - Frequently Asked Questions

libcurl makes all POST and PUT requests (except for POST requests with a
very tiny request body) use the "Expect: 100-continue" header. This header
allows the server to deny the operation early so that libcurl can bail out
already before having to send any data. This is useful in authentication
cases and others.

However, many servers don't implement the Expect: stuff properly and if the
server doesn't respond (positively) within 1 second libcurl will continue
and send off the data anyway.

You can disable libcurl's use of the Expect: header the same way you
disable
any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP
1.0.

Faster now with curl

Ludo

On Tue, Sep 28, 2010 at 3:20 PM, Eugene glum.ua@gmail.com wrote:

PHP client does indeed use curl, so i think i'll just rewrite the
import script to use raw socket (fsockopen()). This will cover my
needs so far.
Thrift on the other hand looks really promising, i'll look in to when
the system i am prototyping goes into production.

-- Eugene

On Sep 28, 2:38 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Another option is the new REST Thrift client at master:
Issues · elastic/elasticsearch · GitHub.
There
is a PHP client at works, maybe add it there?
GitHub - nervetattoo/elasticsearch: Simple PHP client for ElasticSearch(I am not sure if it uses curl
PHP extension)...

-shay.banon

On Tue, Sep 28, 2010 at 2:26 PM, Eugene glum...@gmail.com wrote:

Whoa,

I just tried with HTTPClient and it does indeed seem to get my big
document through to ES fast! Thanks a million mate, i've been banging
my head against the wall all day today.

My original plan was to use curl php extension in a php script (turns
out it is as slow as the command line curl) and iterate through my
mongodb documents. I will now look into a different php client...

On Sep 28, 2:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

Its curl acting up. It takes about 2-8 milliseconds on my laptop
to
index
it using a different http client (I used the mac one called
HTTPClient).
How
do you plan to load the docs?

-shay.banon

On Tue, Sep 28, 2010 at 1:20 PM, Eugene glum...@gmail.com wrote:

Hey guys,

I'm struggling with a strange Elasticsearch insert problem (tried
both
0.10 and git master versions). On small ~800 chars documents the
insert takes 0.015s (6f186 - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000) but on bigger
~4000chars documents the insert time jumps to over 2 seconds:
7422fb - eugene@localhost:~$ time curl -XPUT http://localhost:9200/users/user/4ca1293afba8fac140000!

I have one local es instance and i've tried with different number
of
shards and store options. I've also tried to tweak 'direct',
'buffer_size', 'warm_cache' etc. Plus i've tried starting
elasticsearch with different -Des.index.gateway.snapshot_interval
and -
Des.index.engine.robin.refresh_interval arguments. Nothing seems
to
help

I hope you can point me in the right direction so i can index all
of
my 200000 documents (both small and big).

Topic		Replies	Views
Slow large document insertion Elasticsearch	2	416	July 6, 2017
Inserts get slower when index become large Elasticsearch	10	433	July 6, 2017
Slowly Indexing speed Elasticsearch	26	861	August 18, 2020
Faster inserts Elasticsearch	3	712	July 6, 2017
Queries get slow while indexing documents Elasticsearch	9	1823	November 5, 2020

Slow inserts on bigger documents

You can disable libcurl's use of the Expect: header the same way you disable any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0.

You can disable libcurl's use of the Expect: header the same way you disable any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0.

Related topics

You can disable libcurl's use of the Expect: header the same way you disable
any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0.

You can disable libcurl's use of the Expect: header the same way you
disable
any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP
1.0.