Poor performance indexing documents

Mark_Woon · October 28, 2013, 11:04pm

I've installed ES 0.90.5 on a RHEL 6 server using the RPM. Using the
default config, I start indexing my data. After about 200 documents or so,
I'm getting 503's every 5 documents. The documents aren't large (e.g.
https://gist.github.com/markwoon/7206263). It goes away after a couple
seconds, but I consistently get a 503 after 5 more documents.

I'm new to ES, so any pointers would be helpful on how I can improve
indexing speed.

I've looked at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html#setup-configuration-memory

I've changed ES_HEAP_SIZE to 8GB.
I've checked that max_file_descriptors is up to 64K.
I've tried setting bootstrap.mlockall to true.

None has made much of a difference.

I'm not even sure what kind of logging or testing I can do to figure out
what my problem is. Any ideas?

Thanks,
-Mark

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Carofe · October 28, 2013, 11:13pm

I was able to increase the the indexing speed significantly doing the
following (this is not exactly good for real-time indexing/searching
requirements):

before the indexing:

number_of_shards:3 (something more than 1, it depends on your needs and
resources)
number_of_replicas: 0
refresh_interval:-1
merge.policy.max_merged_segment: 1gb

Do indexing

after the indexing:
set number_of_replicas: 1 (or whatever is good for your system)
refresh manually with the API

On Monday, October 28, 2013 4:04:51 PM UTC-7, Mark Woon wrote:

I've installed ES 0.90.5 on a RHEL 6 server using the RPM. Using the
default config, I start indexing my data. After about 200 documents or so,
I'm getting 503's every 5 documents. The documents aren't large (e.g.
Sample JSON for MEDLINE data (PMID:12336567). · GitHub). It goes away after a couple
seconds, but I consistently get a 503 after 5 more documents.

I'm new to ES, so any pointers would be helpful on how I can improve
indexing speed.

I've looked at
Elasticsearch Platform — Find real-time answers at scale | Elastic

I've changed ES_HEAP_SIZE to 8GB.
I've checked that max_file_descriptors is up to 64K.
I've tried setting bootstrap.mlockall to true.

None has made much of a difference.

I'm not even sure what kind of logging or testing I can do to figure out
what my problem is. Any ideas?

Thanks,
-Mark

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · October 29, 2013, 9:07am

How do you index the data: What language is the client? Which API do you
use?

Can you show a stack trace of the 503 error, in your client, and maybe
messages in the cluster log file?

You do not need 8 GB heap, 64k file descriptors, or mlockall to index such
a tiny amount of 200 docs.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Carofe · October 29, 2013, 4:07pm

On Tuesday, October 29, 2013 2:07:09 AM UTC-7, Jörg Prante wrote:

How do you index the data: What language is the client? Which API do you
use?

Java API for everything.

Can you show a stack trace of the 503 error, in your client, and maybe
messages in the cluster log file?

I don't actually have that error but my indexing was very slow (many
millions of documents to index)

You do not need 8 GB heap, 64k file descriptors, or mlockall to index such
a tiny amount of 200 docs.

I'm using 4gb per data node. That's more than enough for me. I think 1gb
may be enough for you.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mark_Woon · October 30, 2013, 1:01am

Solved my problem, and it had nothing to do with ES.

There was a proxy in front of the ES server that was rate limiting it...

-Mark

On Monday, October 28, 2013 4:04:51 PM UTC-7, Mark Woon wrote:

I've installed ES 0.90.5 on a RHEL 6 server using the RPM. Using the
default config, I start indexing my data. After about 200 documents or so,
I'm getting 503's every 5 documents. The documents aren't large (e.g.
Sample JSON for MEDLINE data (PMID:12336567). · GitHub). It goes away after a couple
seconds, but I consistently get a 503 after 5 more documents.

I'm new to ES, so any pointers would be helpful on how I can improve
indexing speed.

I've looked at
Elasticsearch Platform — Find real-time answers at scale | Elastic

I've changed ES_HEAP_SIZE to 8GB.
I've checked that max_file_descriptors is up to 64K.
I've tried setting bootstrap.mlockall to true.

None has made much of a difference.

I'm not even sure what kind of logging or testing I can do to figure out
what my problem is. Any ideas?

Thanks,
-Mark

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elasticsearch Indexing slows down after having indexed 1000 Documents Elasticsearch	1	387	July 6, 2017
Slow Indexing Speed Elasticsearch	5	7243	July 6, 2017
Memory and index size dependency - performance problems Elasticsearch	10	1100	July 6, 2017
Index speed degradation Elasticsearch	7	477	July 6, 2017
Confused about tuning Elasticsearch	7	739	July 6, 2017

Poor performance indexing documents

Related topics