Scaling ES, Please HELP


(Fabien Guiraud) #1

Hi,

Here's my configuration :

  • OS : Ubuntu 12.04.3

  • JVM : oracle-7-jdk

  • 6 nodes

  • I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2.
    And so each server has 34.2Go RAM, and I use only ephemeral disk for
    storage.

  • My cluster health status : https://gist.github.com/zywx/6612106

  • I have 7 indexes, the problem is on the big one, about 15 millions of
    documents.

  • 6 indexes with 5 shards and 5 replicas, and the big one with 5 shards and
    1 replica.

  • The big one mapping is quite simple, a lot of fields; 2 fieds in
    not_analysed.

My problem is querying ES with faceting like terms_stats is really slow,
about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Fabien Guiraud) #2

Additional information, in paramedic, status of a node :

IP: inet[/10.0.1.242:9200]

HOST: ip-10-0-1-242.ec2.internal

LOAD: 0.860

SIZE: 9gb

DOCS: 11 494 592

HEAP: 7.3gb /20gb

On Wednesday, September 18, 2013 7:07:10 PM UTC+2, Fabien Guiraud wrote:

Hi,

Here's my configuration :

  • OS : Ubuntu 12.04.3

  • JVM : oracle-7-jdk

  • 6 nodes

  • I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2.
    And so each server has 34.2Go RAM, and I use only ephemeral disk for
    storage.

  • My cluster health status : https://gist.github.com/zywx/6612106

  • I have 7 indexes, the problem is on the big one, about 15 millions of
    documents.

  • 6 indexes with 5 shards and 5 replicas, and the big one with 5 shards
    and 1 replica.

  • The big one mapping is quite simple, a lot of fields; 2 fieds in
    not_analysed.

My problem is querying ES with faceting like terms_stats is really slow,
about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Eric Rodriguez) #3

I never had performance issue, but I would look into:

  • allowed memory settings for JVM
  • JVM itself: version and/or vendor
  • io bottleneck, maybe IOPS on amazon
  • network speed

But that's only generic advice, not sure this will help you.

Eric

Sent from my iPhone

On 18 Sep 2013, at 19:07, Fabien Guiraud fabien1909@gmail.com wrote:

Hi,

Here's my configuration :

  • OS : Ubuntu 12.04.3

  • JVM : oracle-7-jdk

  • 6 nodes

  • I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2.
    And so each server has 34.2Go RAM, and I use only ephemeral disk for
    storage.

  • My cluster health status : https://gist.github.com/zywx/6612106

  • I have 7 indexes, the problem is on the big one, about 15 millions of
    documents.

  • 6 indexes with 5 shards and 5 replicas, and the big one with 5 shards and
    1 replica.

  • The big one mapping is quite simple, a lot of fields; 2 fieds in
    not_analysed.

My problem is querying ES with faceting like terms_stats is really slow,
about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Fabien Guiraud) #4

Ok yes I already have a look to that. Everything seems normal. But I have 20gb per node, each total docs size is << 20 gb but heap size seems to be block around 5gb. Is that a problem/lead?

Le 18 sept. 2013 à 20:30, Eric Rodriguez thewavyx@gmail.com a écrit :

I never had performance issue, but I would look into:

  • allowed memory settings for JVM
  • JVM itself: version and/or vendor
  • io bottleneck, maybe IOPS on amazon
  • network speed

But that's only generic advice, not sure this will help you.

Eric

Sent from my iPhone

On 18 Sep 2013, at 19:07, Fabien Guiraud fabien1909@gmail.com wrote:

Hi,

Here's my configuration :

  • OS : Ubuntu 12.04.3

  • JVM : oracle-7-jdk

  • 6 nodes

  • I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2. And so each server has 34.2Go RAM, and I use only ephemeral disk for storage.

  • My cluster health status : https://gist.github.com/zywx/6612106

  • I have 7 indexes, the problem is on the big one, about 15 millions of documents.

  • 6 indexes with 5 shards and 5 replicas, and the big one with 5 shards and 1 replica.

  • The big one mapping is quite simple, a lot of fields; 2 fieds in not_analysed.

My problem is querying ES with faceting like terms_stats is really slow, about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/_OaE2FzN4Ss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Matt Weber) #5

You should not facet on a field that is analyzed. Did you allow fielddata
cache to be warmed or are you testing the first couple queries? What do
your queries look like? How many facets? Are you sorting? Did you change
any of your fielddata settings?

On Wed, Sep 18, 2013 at 11:35 AM, Fabien Guiraud fabien1909@gmail.comwrote:

Ok yes I already have a look to that. Everything seems normal. But I have
20gb per node, each total docs size is << 20 gb but heap size seems to be
block around 5gb. Is that a problem/lead?

Le 18 sept. 2013 à 20:30, Eric Rodriguez thewavyx@gmail.com a écrit :

I never had performance issue, but I would look into:

  • allowed memory settings for JVM
  • JVM itself: version and/or vendor
  • io bottleneck, maybe IOPS on amazon
  • network speed

But that's only generic advice, not sure this will help you.

Eric

Sent from my iPhone

On 18 Sep 2013, at 19:07, Fabien Guiraud fabien1909@gmail.com wrote:

Hi,

Here's my configuration :

  • OS : Ubuntu 12.04.3

  • JVM : oracle-7-jdk

  • 6 nodes

  • I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2.
    And so each server has 34.2Go RAM, and I use only ephemeral disk for
    storage.

  • My cluster health status : https://gist.github.com/zywx/6612106

  • I have 7 indexes, the problem is on the big one, about 15 millions of
    documents.

  • 6 indexes with 5 shards and 5 replicas, and the big one with 5 shards
    and 1 replica.

  • The big one mapping is quite simple, a lot of fields; 2 fieds in
    not_analysed.

My problem is querying ES with faceting like terms_stats is really slow,
about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_OaE2FzN4Ss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Fabien Guiraud) #6

Hi,

Ok, so I need to pass all my field in not_analysed.

One of the most problematic query is 5 statistical facets (shares_count,
likes_count, dislikes_count, etc.. only counts), with a query to filter on
a field. None of the count or the field are to not_analysed.
The other one is a terms_stats query with a value_script,
"doc['shares_count,'].value + doc['likes_count'].value +
doc['dislikes_count'].value"

I doesn't change fielddata settings.

If it can help, here a screenshot of the shards (paramedic view) :

Thanks for your answer,

On Thursday, September 19, 2013 3:09:19 AM UTC+2, Matt Weber wrote:

You should not facet on a field that is analyzed. Did you allow fielddata
cache to be warmed or are you testing the first couple queries? What do
your queries look like? How many facets? Are you sorting? Did you change
any of your fielddata settings?

On Wed, Sep 18, 2013 at 11:35 AM, Fabien Guiraud <fabie...@gmail.com<javascript:>

wrote:

Ok yes I already have a look to that. Everything seems normal. But I have
20gb per node, each total docs size is << 20 gb but heap size seems to be
block around 5gb. Is that a problem/lead?

Le 18 sept. 2013 à 20:30, Eric Rodriguez <thew...@gmail.com <javascript:>>
a écrit :

I never had performance issue, but I would look into:

  • allowed memory settings for JVM
  • JVM itself: version and/or vendor
  • io bottleneck, maybe IOPS on amazon
  • network speed

But that's only generic advice, not sure this will help you.

Eric

Sent from my iPhone

On 18 Sep 2013, at 19:07, Fabien Guiraud <fabie...@gmail.com<javascript:>>
wrote:

Hi,

Here's my configuration :

  • OS : Ubuntu 12.04.3

  • JVM : oracle-7-jdk

  • 6 nodes

  • I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on
    EC2. And so each server has 34.2Go RAM, and I use only ephemeral disk for
    storage.

  • My cluster health status : https://gist.github.com/zywx/6612106

  • I have 7 indexes, the problem is on the big one, about 15 millions of
    documents.

  • 6 indexes with 5 shards and 5 replicas, and the big one with 5 shards
    and 1 replica.

  • The big one mapping is quite simple, a lot of fields; 2 fieds in
    not_analysed.

My problem is querying ES with faceting like terms_stats is really slow,
about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_OaE2FzN4Ss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Fabien Guiraud) #7

Hi,

My fields are now all in not_analyzed, but it still very slow : ~2/3 sec per query.

I don't know what to do more, I optimised my queries with facet_filter, I have only 15 millions of docs, 1 replica and 5 shards for 6 big servers.

Is ElasticSearch is truly a good thing for analytics? Not so sure right now..

What I can do?

Thanks

Fabien

On Sep 19, 2013, at 11:14 AM, Fabien Guiraud fabien1909@gmail.com wrote:

Hi,

Ok, so I need to pass all my field in not_analysed.

One of the most problematic query is 5 statistical facets (shares_count, likes_count, dislikes_count, etc.. only counts), with a query to filter on a field. None of the count or the field are to not_analysed.
The other one is a terms_stats query with a value_script, "doc['shares_count,'].value + doc['likes_count'].value + doc['dislikes_count'].value"

I doesn't change fielddata settings.

If it can help, here a screenshot of the shards (paramedic view) :

Thanks for your answer,

On Thursday, September 19, 2013 3:09:19 AM UTC+2, Matt Weber wrote:
You should not facet on a field that is analyzed. Did you allow fielddata cache to be warmed or are you testing the first couple queries? What do your queries look like? How many facets? Are you sorting? Did you change any of your fielddata settings?

On Wed, Sep 18, 2013 at 11:35 AM, Fabien Guiraud fabie...@gmail.com wrote:
Ok yes I already have a look to that. Everything seems normal. But I have 20gb per node, each total docs size is << 20 gb but heap size seems to be block around 5gb. Is that a problem/lead?

Le 18 sept. 2013 à 20:30, Eric Rodriguez thew...@gmail.com a écrit :

I never had performance issue, but I would look into:

  • allowed memory settings for JVM
  • JVM itself: version and/or vendor
  • io bottleneck, maybe IOPS on amazon
  • network speed

But that's only generic advice, not sure this will help you.

Eric

Sent from my iPhone

On 18 Sep 2013, at 19:07, Fabien Guiraud fabie...@gmail.com wrote:

Hi,

Here's my configuration :

  • OS : Ubuntu 12.04.3

  • JVM : oracle-7-jdk

  • 6 nodes

  • I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2. And so each server has 34.2Go RAM, and I use only ephemeral disk for storage.

  • My cluster health status : https://gist.github.com/zywx/6612106

  • I have 7 indexes, the problem is on the big one, about 15 millions of documents.

  • 6 indexes with 5 shards and 5 replicas, and the big one with 5 shards and 1 replica.

  • The big one mapping is quite simple, a lot of fields; 2 fieds in not_analysed.

My problem is querying ES with faceting like terms_stats is really slow, about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/_OaE2FzN4Ss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/_OaE2FzN4Ss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8