Scaling ES, Please HELP

Fabien_Guiraud · September 18, 2013, 5:07pm

Hi,

Here's my configuration :

OS : Ubuntu 12.04.3
JVM : oracle-7-jdk
6 nodes
I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2.
And so each server has 34.2Go RAM, and I use only ephemeral disk for
storage.
My cluster health status : https://gist.github.com/zywx/6612106
I have 7 indexes, the problem is on the big one, about 15 millions of
documents.
6 indexes with 5 shards and 5 replicas, and the big one with 5 shards and
1 replica.
The big one mapping is quite simple, a lot of fields; 2 fieds in
not_analysed.

My problem is querying ES with faceting like terms_stats is really slow,
about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Fabien_Guiraud · September 18, 2013, 5:25pm

Additional information, in paramedic, status of a node :

IP: inet[/10.0.1.242:9200]

HOST: ip-10-0-1-242.ec2.internal

LOAD: 0.860

SIZE: 9gb

DOCS: 11 494 592

HEAP: 7.3gb /20gb

On Wednesday, September 18, 2013 7:07:10 PM UTC+2, Fabien Guiraud wrote:

Hi,

Here's my configuration :

OS : Ubuntu 12.04.3

JVM : oracle-7-jdk

6 nodes

I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2.
And so each server has 34.2Go RAM, and I use only ephemeral disk for
storage.

My cluster health status : https://gist.github.com/zywx/6612106

I have 7 indexes, the problem is on the big one, about 15 millions of
documents.

6 indexes with 5 shards and 5 replicas, and the big one with 5 shards
and 1 replica.

The big one mapping is quite simple, a lot of fields; 2 fieds in
not_analysed.

My problem is querying ES with faceting like terms_stats is really slow,
about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Eric_Rodriguez · September 18, 2013, 6:30pm

I never had performance issue, but I would look into:

allowed memory settings for JVM
JVM itself: version and/or vendor
io bottleneck, maybe IOPS on amazon
network speed

But that's only generic advice, not sure this will help you.

Eric

Sent from my iPhone

On 18 Sep 2013, at 19:07, Fabien Guiraud fabien1909@gmail.com wrote:

Hi,

Here's my configuration :

OS : Ubuntu 12.04.3
JVM : oracle-7-jdk
6 nodes
I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2.
And so each server has 34.2Go RAM, and I use only ephemeral disk for
storage.
My cluster health status : https://gist.github.com/zywx/6612106
I have 7 indexes, the problem is on the big one, about 15 millions of
documents.
6 indexes with 5 shards and 5 replicas, and the big one with 5 shards and
1 replica.
The big one mapping is quite simple, a lot of fields; 2 fieds in
not_analysed.

My problem is querying ES with faceting like terms_stats is really slow,
about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Fabien_Guiraud · September 18, 2013, 6:35pm

Ok yes I already have a look to that. Everything seems normal. But I have 20gb per node, each total docs size is << 20 gb but heap size seems to be block around 5gb. Is that a problem/lead?

Le 18 sept. 2013 à 20:30, Eric Rodriguez thewavyx@gmail.com a écrit :

I never had performance issue, but I would look into:

allowed memory settings for JVM

JVM itself: version and/or vendor

io bottleneck, maybe IOPS on amazon

network speed

But that's only generic advice, not sure this will help you.

Eric

Sent from my iPhone

On 18 Sep 2013, at 19:07, Fabien Guiraud fabien1909@gmail.com wrote:

Hi,

Here's my configuration :

OS : Ubuntu 12.04.3

JVM : oracle-7-jdk

6 nodes

I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2. And so each server has 34.2Go RAM, and I use only ephemeral disk for storage.

My cluster health status : https://gist.github.com/zywx/6612106

I have 7 indexes, the problem is on the big one, about 15 millions of documents.

6 indexes with 5 shards and 5 replicas, and the big one with 5 shards and 1 replica.

The big one mapping is quite simple, a lot of fields; 2 fieds in not_analysed.

My problem is querying ES with faceting like terms_stats is really slow, about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/_OaE2FzN4Ss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mattweber · September 19, 2013, 1:09am

You should not facet on a field that is analyzed. Did you allow fielddata
cache to be warmed or are you testing the first couple queries? What do
your queries look like? How many facets? Are you sorting? Did you change
any of your fielddata settings?

On Wed, Sep 18, 2013 at 11:35 AM, Fabien Guiraud fabien1909@gmail.comwrote:

Ok yes I already have a look to that. Everything seems normal. But I have
20gb per node, each total docs size is << 20 gb but heap size seems to be
block around 5gb. Is that a problem/lead?

Le 18 sept. 2013 à 20:30, Eric Rodriguez thewavyx@gmail.com a écrit :

I never had performance issue, but I would look into:

allowed memory settings for JVM

JVM itself: version and/or vendor

io bottleneck, maybe IOPS on amazon

network speed

But that's only generic advice, not sure this will help you.

Eric

Sent from my iPhone

On 18 Sep 2013, at 19:07, Fabien Guiraud fabien1909@gmail.com wrote:

Hi,

Here's my configuration :

OS : Ubuntu 12.04.3

JVM : oracle-7-jdk

6 nodes

I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2.
And so each server has 34.2Go RAM, and I use only ephemeral disk for
storage.

My cluster health status : https://gist.github.com/zywx/6612106

I have 7 indexes, the problem is on the big one, about 15 millions of
documents.

6 indexes with 5 shards and 5 replicas, and the big one with 5 shards
and 1 replica.

The big one mapping is quite simple, a lot of fields; 2 fieds in
not_analysed.

My problem is querying ES with faceting like terms_stats is really slow,
about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_OaE2FzN4Ss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Fabien_Guiraud · September 19, 2013, 9:14am

Hi,

Ok, so I need to pass all my field in not_analysed.

One of the most problematic query is 5 statistical facets (shares_count,
likes_count, dislikes_count, etc.. only counts), with a query to filter on
a field. None of the count or the field are to not_analysed.
The other one is a terms_stats query with a value_script,
"doc['shares_count,'].value + doc['likes_count'].value +
doc['dislikes_count'].value"

I doesn't change fielddata settings.

If it can help, here a screenshot of the shards (paramedic view) :

Thanks for your answer,

On Thursday, September 19, 2013 3:09:19 AM UTC+2, Matt Weber wrote:

You should not facet on a field that is analyzed. Did you allow fielddata
cache to be warmed or are you testing the first couple queries? What do
your queries look like? How many facets? Are you sorting? Did you change
any of your fielddata settings?

On Wed, Sep 18, 2013 at 11:35 AM, Fabien Guiraud <fabie...@gmail.com<javascript:>

wrote:

Ok yes I already have a look to that. Everything seems normal. But I have
20gb per node, each total docs size is << 20 gb but heap size seems to be
block around 5gb. Is that a problem/lead?

Le 18 sept. 2013 à 20:30, Eric Rodriguez <thew...@gmail.com <javascript:>>
a écrit :

I never had performance issue, but I would look into:

allowed memory settings for JVM

JVM itself: version and/or vendor

io bottleneck, maybe IOPS on amazon

network speed

But that's only generic advice, not sure this will help you.

Eric

Sent from my iPhone

On 18 Sep 2013, at 19:07, Fabien Guiraud <fabie...@gmail.com<javascript:>>
wrote:

Hi,

Here's my configuration :

OS : Ubuntu 12.04.3

JVM : oracle-7-jdk

6 nodes

I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on
EC2. And so each server has 34.2Go RAM, and I use only ephemeral disk for
storage.

My cluster health status : https://gist.github.com/zywx/6612106

I have 7 indexes, the problem is on the big one, about 15 millions of
documents.

6 indexes with 5 shards and 5 replicas, and the big one with 5 shards
and 1 replica.

The big one mapping is quite simple, a lot of fields; 2 fieds in
not_analysed.

My problem is querying ES with faceting like terms_stats is really slow,
about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_OaE2FzN4Ss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Fabien_Guiraud · September 23, 2013, 9:59am

Hi,

My fields are now all in not_analyzed, but it still very slow : ~2/3 sec per query.

I don't know what to do more, I optimised my queries with facet_filter, I have only 15 millions of docs, 1 replica and 5 shards for 6 big servers.

Is Elasticsearch is truly a good thing for analytics? Not so sure right now..

What I can do?

Thanks

Fabien

On Sep 19, 2013, at 11:14 AM, Fabien Guiraud fabien1909@gmail.com wrote:

Hi,

Ok, so I need to pass all my field in not_analysed.

One of the most problematic query is 5 statistical facets (shares_count, likes_count, dislikes_count, etc.. only counts), with a query to filter on a field. None of the count or the field are to not_analysed.
The other one is a terms_stats query with a value_script, "doc['shares_count,'].value + doc['likes_count'].value + doc['dislikes_count'].value"

I doesn't change fielddata settings.

If it can help, here a screenshot of the shards (paramedic view) :

Thanks for your answer,

On Thursday, September 19, 2013 3:09:19 AM UTC+2, Matt Weber wrote:
You should not facet on a field that is analyzed. Did you allow fielddata cache to be warmed or are you testing the first couple queries? What do your queries look like? How many facets? Are you sorting? Did you change any of your fielddata settings?

On Wed, Sep 18, 2013 at 11:35 AM, Fabien Guiraud fabie...@gmail.com wrote:
Ok yes I already have a look to that. Everything seems normal. But I have 20gb per node, each total docs size is << 20 gb but heap size seems to be block around 5gb. Is that a problem/lead?

Le 18 sept. 2013 à 20:30, Eric Rodriguez thew...@gmail.com a écrit :

I never had performance issue, but I would look into:

allowed memory settings for JVM

JVM itself: version and/or vendor

io bottleneck, maybe IOPS on amazon

network speed

But that's only generic advice, not sure this will help you.

Eric

Sent from my iPhone

On 18 Sep 2013, at 19:07, Fabien Guiraud fabie...@gmail.com wrote:

Hi,

Here's my configuration :

OS : Ubuntu 12.04.3

JVM : oracle-7-jdk

6 nodes

I have 6 servers (1 server = 1 node), all servers are m2.2xlarge on EC2. And so each server has 34.2Go RAM, and I use only ephemeral disk for storage.

My cluster health status : https://gist.github.com/zywx/6612106

I have 7 indexes, the problem is on the big one, about 15 millions of documents.

6 indexes with 5 shards and 5 replicas, and the big one with 5 shards and 1 replica.

The big one mapping is quite simple, a lot of fields; 2 fieds in not_analysed.

My problem is querying ES with faceting like terms_stats is really slow, about 4/5 seconds. None of the fields use in terms_stats are not_analysed.

I can provide what anyone need to better understand my problem.

Thanks a lot,

Fabien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/_OaE2FzN4Ss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/_OaE2FzN4Ss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Architecture for scaling ES Elasticsearch	4	299	July 6, 2017
Performance degrading after a couple of weeks Elasticsearch	7	520	October 30, 2018
Advise on How to optimize the ElasticStack Elasticsearch	3	426	December 18, 2018
Jvm Heap Size & Indexing Perfmance Problem Elasticsearch	1	461	March 11, 2020
JVM > 90% - Small indexes , High Shards Elasticsearch	6	955	July 5, 2017

Scaling ES, Please HELP

Fabien

Related topics