[Scaling elastic server] how much load can elastic search handle?

Hi,

I have a single aws EC2 large instance with 7.5 GB ram and 100 GB harddrive
dual core 2.6 GHz. My elastic instance on a average has around 10,000,000
records. I use somewhat complex queries. I am calling elastic apis from my
PHP code which is exposed as a rest service(needed to do some post
processing of data). my question is how much load can my server handle and
at what point do i shift to a multi node architecture. What effect does #
of shards and replication have on performance. With my current system
configuration how many queries per second(qps) can my elastic search
handle?

Thanks and Regards,
Abrar.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/49f1dbbe-42e6-445e-ba6e-b9d358d8908f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello Abrar,

The answer to your questions depends a lot on how your data and queries
look like, how often you run them and how often new data is indexed. You
could paste those details here, but I don't think anyone could give you a
definite answer, maybe more of a guesstimate based on experience with
similar patterns.

The best way to find out is to install some performance monitoring tool
(there are many out there, you can find one by clicking the link in my
signature) and start running tests with production-like data and queries.
And then you'll see how much your machine can handle and where the
bottlenecks are.

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Tue, Apr 29, 2014 at 4:02 PM, Abrar Sheikh abrar2002as@gmail.com wrote:

Hi,

I have a single aws EC2 large instance with 7.5 GB ram and 100 GB
harddrive dual core 2.6 GHz. My elastic instance on a average has around
10,000,000 records. I use somewhat complex queries. I am calling elastic
apis from my PHP code which is exposed as a rest service(needed to do some
post processing of data). my question is how much load can my server
handle and at what point do i shift to a multi node architecture. What
effect does # of shards and replication have on performance. With my
current system configuration how many queries per second(qps) can my
Elasticsearch handle?

Thanks and Regards,
Abrar.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/49f1dbbe-42e6-445e-ba6e-b9d358d8908f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/49f1dbbe-42e6-445e-ba6e-b9d358d8908f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0TiMyRc-9qcPTFWYT6eHYWQc_PWd96RbFy6eNk_aAc_Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Data that i am storing is a tweet response from twitter which flows in via
a streaming API, so the inserts are quite fast (roughly 50-80 inserts/sec).

query look something like this
{
"query": {
"filtered": {
"filter": {
"and": [{ //i have a range slider in the UI which
selects tweets on rank range
"range": {
"rank": {
"from": "500",
"to": "1000",
"include_upper": true
}
}
},
{ //if there are custom query terms provided by the
quer
"query": {
"terms": {
"entities": ["searchString"],
"minimum_should_match": # of search terms
}
}
},
{
"missing": {
"field": "in_reply_to_status_id",
"existence": true,
"null_value": true
}
},
{ //have a country field query
"query": {
"term": {
"user.country_new": "in"
}
}
}]
}
}
},
"from": "0", //paging
"size": "50",
"sort": [{ //sorting
"created_at": {
"order": "desc"
}
}],
"aggregations": {
"hashtags_freq": [{ //get hashtag freqency aggr in
result set
"terms": {
"field": "entities",
"include": "(#[a-zA-Z][a-zA-Z0-9_-]+)",
"size": 0
},
"aggregations": {
"unique_users": {
"terms": {
"field": "user.screen_name",
"size": 0
}
}
}
}],
"user_tweet_frequency": { //# of tweets by all the users in
resultset
"terms": {
"field": "user.screen_name",
"size": 0
}
}
}
}

this query is fired by all the users that come to our site and it is called
with varying values of rank range

On Tue, Apr 29, 2014 at 7:10 PM, Radu Gheorghe
radu.gheorghe@sematext.comwrote:

Hello Abrar,

The answer to your questions depends a lot on how your data and queries
look like, how often you run them and how often new data is indexed. You
could paste those details here, but I don't think anyone could give you a
definite answer, maybe more of a guesstimate based on experience with
similar patterns.

The best way to find out is to install some performance monitoring tool
(there are many out there, you can find one by clicking the link in my
signature) and start running tests with production-like data and queries.
And then you'll see how much your machine can handle and where the
bottlenecks are.

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Tue, Apr 29, 2014 at 4:02 PM, Abrar Sheikh abrar2002as@gmail.comwrote:

Hi,

I have a single aws EC2 large instance with 7.5 GB ram and 100 GB
harddrive dual core 2.6 GHz. My elastic instance on a average has around
10,000,000 records. I use somewhat complex queries. I am calling elastic
apis from my PHP code which is exposed as a rest service(needed to do some
post processing of data). my question is how much load can my server
handle and at what point do i shift to a multi node architecture. What
effect does # of shards and replication have on performance. With my
current system configuration how many queries per second(qps) can my
Elasticsearch handle?

Thanks and Regards,
Abrar.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/49f1dbbe-42e6-445e-ba6e-b9d358d8908f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/49f1dbbe-42e6-445e-ba6e-b9d358d8908f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/HjuhwaBq_RQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0TiMyRc-9qcPTFWYT6eHYWQc_PWd96RbFy6eNk_aAc_Q%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHXA0_0TiMyRc-9qcPTFWYT6eHYWQc_PWd96RbFy6eNk_aAc_Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Abrar Sheikh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAA2_QZ9RtDt2-xKFC5Gkro4rE%2B8mQk%2Br-Uj2es4EruZ-HyCO2g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I would suggest that you install something like bigdesk or marvel to check
your usage, in particular heap, threads and file descriptors.

Every shard is a Lucene index and hence the more shards you have the more
searches that you can do in parallel but you also need memory and file
descriptors for every shard.

I don't believe that anyone could predict with any certainty how quick your
searches will be as there are many variables. Try running your queries with
one of the query browsers like sense and you will see how long the search
took, it is in the took field of the reply.

On Tuesday, 29 April 2014 15:02:24 UTC+2, Abrar Sheikh wrote:

Hi,

I have a single aws EC2 large instance with 7.5 GB ram and 100 GB
harddrive dual core 2.6 GHz. My elastic instance on a average has around
10,000,000 records. I use somewhat complex queries. I am calling elastic
apis from my PHP code which is exposed as a rest service(needed to do some
post processing of data). my question is how much load can my server
handle and at what point do i shift to a multi node architecture. What
effect does # of shards and replication have on performance. With my
current system configuration how many queries per second(qps) can my
Elasticsearch handle?

Thanks and Regards,
Abrar.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b3bf7edb-54df-4c26-ae4a-da73e24b6df0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.