Cluster optimization(indexing/query performace)

Hi All,
I'm building a 2 node cluster(default 5 shards/1 replica), 114 ram on each node(allocated 20g for heap memory).
*Each node is a EC2 Type: hs1.8xlarge with 500G volume ,IOPS: 4000

I noticed that after ~80M docs, the indexing time exceeds dramatically. Took about 5 hours to index
~50M documents, but now it takes an entire night to get from 105M-109M.
I'm aware that the more data I have indexed the more time it takes to index new documents on the shards, but is there anything I can do to make it faster?
Anything I should do different when dealing with that amount of data? more shards/replicas/nodes/heap size etc.?

Also on the same matter, after ~80M docs, the query time reaches a response time that I can't be comfortable with - about 8 seconds for a query(nested query containing bool with several terms - range/must/must_not).

Does anyone have an idea how can I make this go under 1 second(more shards/replicas/nodes/heap size etc.)? if possible at all when dealing with this index sizes?

How much performance improvement would I get if changing to SSD, can this be my solution for both indexing and query performance?

I would be really glad if some of you share their experience with big scale elasticsearch environment.

Thanks in advanced,

Oren

Without knowing your situation exactly: I would generally always
recommend tuning from a amall setting to a high setting. But you have
selected very high resource settings from the beginning which makes it
hard to analyze resource shortcomings in reasonable time.

The question is, what is the cause of your ES node's trouble? If you
haven't find it, you can't improve. Check if you can take some resource
measures of your data, your clients, and your nodes: heap, memory over
time, cpu load, network ... there a lot of monitoring tools around. I
seriously doubt you can get more performance by just turning knobs
higher without knowing exactly what is going on. Tuning can't help when
fighting bottlenecks.

Just a very general comment: ES scales horizontally over the number of
nodes. The more nodes, the better. Sure - you can also exchange or
expand the node hardware. But software problems - if there are any -
will not go away only just because you spend more resources.

Jörg

Am 12.07.13 16:49, schrieb oreno:

Does anyone have an idea how can I make this go under 1 second(more
shards/replicas/nodes/heap size etc.)?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jörg, thanks for helping out.
I actually started with a 2 core cpu/44 RAM machine but I got the same
behavior when reaching ~80M documents.
I figured that if I move it to a much bigger machine I can tell if this is
hardware that is slowing me or configuration(that includes cluster
architecture).
So I moved from a single node on a relatively weak machine to a 2 node
cluster on massive machines and I didn't get the performance difference I
have expected, although there is a difference.
I'm using bigDesk to monitor both operations and I don't see
any bottlenecks that might cause this, except 'non heap memory' that is at
maximum(there is only 44b
allocated for that and I read somewhere that this can't be changed? )
during indexing.
I don't have any errors or exceptions and I'm getting to think that this is
just the limit that elasticsearch has when dealing with that index size on
only 1 or 2 nodes(without using SSD type storage).

At the moment I changed the index's store type to 'mmapfs' to check if
the virtual mapping will improve my query response time.

I'm looking for some base rules when it comes to when you should scale your
cluster, maybe I'm spending my time trying to improve performance with
massive machines
when I should have just started with 5 nodes cluster of medium size
machines. And I'm beginning to think that this is the case...

any thoughts?

Thanks in advanced,

Oren

On Fri, Jul 12, 2013 at 8:52 PM, Jörg Prante [via Elasticsearch Users] <
ml-node+s115913n4038038h13@n3.nabble.com> wrote:

Without knowing your situation exactly: I would generally always
recommend tuning from a amall setting to a high setting. But you have
selected very high resource settings from the beginning which makes it
hard to analyze resource shortcomings in reasonable time.

The question is, what is the cause of your ES node's trouble? If you
haven't find it, you can't improve. Check if you can take some resource
measures of your data, your clients, and your nodes: heap, memory over
time, cpu load, network ... there a lot of monitoring tools around. I
seriously doubt you can get more performance by just turning knobs
higher without knowing exactly what is going on. Tuning can't help when
fighting bottlenecks.

Just a very general comment: ES scales horizontally over the number of
nodes. The more nodes, the better. Sure - you can also exchange or
expand the node hardware. But software problems - if there are any -
will not go away only just because you spend more resources.

Jörg

Am 12.07.13 16:49, schrieb oreno:

Does anyone have an idea how can I make this go under 1 second(more
shards/replicas/nodes/heap size etc.)?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4038038&i=0.

For more options, visit https://groups.google.com/groups/opt_out.


If you reply to this email, your message will be added to the discussion
below:

http://elasticsearch-users.115913.n3.nabble.com/Cluster-optimization-indexing-query-performace-tp4038027p4038038.html
To unsubscribe from Cluster optimization(indexing/query performace), click
herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4038027&code=b3Jlbm9AZXhlbGF0ZS5jb218NDAzODAyN3w4ODk1Mjk2Nzg=
.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml

Another thing,
I don't see any difference in query time from when having 2 nodes running in the cluster to when removing one of the nodes.

when they are both up I can see that both machines CPU's are having a spike on request time, meaning that they both anticipated in the request processing, but it's like they didn't really have an added value for one another.

I should mention that both nodes are configured as data nodes without HTTP node, so I'm just calling one of them for the request. Don't think that should be a problem, but who knows...

Thanks,