Poor performance indexing documents

I've installed ES 0.90.5 on a RHEL 6 server using the RPM. Using the
default config, I start indexing my data. After about 200 documents or so,
I'm getting 503's every 5 documents. The documents aren't large (e.g.
https://gist.github.com/markwoon/7206263). It goes away after a couple
seconds, but I consistently get a 503 after 5 more documents.

I'm new to ES, so any pointers would be helpful on how I can improve
indexing speed.

I've looked at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html#setup-configuration-memory

I've changed ES_HEAP_SIZE to 8GB.
I've checked that max_file_descriptors is up to 64K.
I've tried setting bootstrap.mlockall to true.

None has made much of a difference.

I'm not even sure what kind of logging or testing I can do to figure out
what my problem is. Any ideas?

Thanks,
-Mark

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I was able to increase the the indexing speed significantly doing the
following (this is not exactly good for real-time indexing/searching
requirements):

before the indexing:

number_of_shards:3 (something more than 1, it depends on your needs and
resources)
number_of_replicas: 0
refresh_interval:-1
merge.policy.max_merged_segment: 1gb

Do indexing

after the indexing:
set number_of_replicas: 1 (or whatever is good for your system)
refresh manually with the API

On Monday, October 28, 2013 4:04:51 PM UTC-7, Mark Woon wrote:

I've installed ES 0.90.5 on a RHEL 6 server using the RPM. Using the
default config, I start indexing my data. After about 200 documents or so,
I'm getting 503's every 5 documents. The documents aren't large (e.g.
Sample JSON for MEDLINE data (PMID:12336567). · GitHub). It goes away after a couple
seconds, but I consistently get a 503 after 5 more documents.

I'm new to ES, so any pointers would be helpful on how I can improve
indexing speed.

I've looked at
Elasticsearch Platform — Find real-time answers at scale | Elastic

I've changed ES_HEAP_SIZE to 8GB.
I've checked that max_file_descriptors is up to 64K.
I've tried setting bootstrap.mlockall to true.

None has made much of a difference.

I'm not even sure what kind of logging or testing I can do to figure out
what my problem is. Any ideas?

Thanks,
-Mark

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

How do you index the data: What language is the client? Which API do you
use?

Can you show a stack trace of the 503 error, in your client, and maybe
messages in the cluster log file?

You do not need 8 GB heap, 64k file descriptors, or mlockall to index such
a tiny amount of 200 docs.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Tuesday, October 29, 2013 2:07:09 AM UTC-7, Jörg Prante wrote:

How do you index the data: What language is the client? Which API do you
use?

Java API for everything.

Can you show a stack trace of the 503 error, in your client, and maybe
messages in the cluster log file?

I don't actually have that error but my indexing was very slow (many
millions of documents to index)

You do not need 8 GB heap, 64k file descriptors, or mlockall to index such
a tiny amount of 200 docs.

I'm using 4gb per data node. That's more than enough for me. I think 1gb
may be enough for you.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Solved my problem, and it had nothing to do with ES.

There was a proxy in front of the ES server that was rate limiting it...

-Mark

On Monday, October 28, 2013 4:04:51 PM UTC-7, Mark Woon wrote:

I've installed ES 0.90.5 on a RHEL 6 server using the RPM. Using the
default config, I start indexing my data. After about 200 documents or so,
I'm getting 503's every 5 documents. The documents aren't large (e.g.
Sample JSON for MEDLINE data (PMID:12336567). · GitHub). It goes away after a couple
seconds, but I consistently get a 503 after 5 more documents.

I'm new to ES, so any pointers would be helpful on how I can improve
indexing speed.

I've looked at
Elasticsearch Platform — Find real-time answers at scale | Elastic

I've changed ES_HEAP_SIZE to 8GB.
I've checked that max_file_descriptors is up to 64K.
I've tried setting bootstrap.mlockall to true.

None has made much of a difference.

I'm not even sure what kind of logging or testing I can do to figure out
what my problem is. Any ideas?

Thanks,
-Mark

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.