Aggregations-only use case - performance tuning via config possible?


(Ben-2) #1

Hi there,

I am using ES for calculating aggregations on a dataset of sales data
(about 50,000,000 docs or 10GB of data). As an example, I am using the date
histogram aggregation with term / sum sub-aggregations to get the sales sum
per day and product. There is a product_id, a date field, and a quantity
field among others.

This use case has no live indexing (!). I bulk-index the new sales data
once a day, shortly after midnight for the previous day only - during the
rest of the day, no new data is added. I also do not use any result sets
other than the aggregations results, so my result size is always set to 0
(zero) in queries.

My machine has 128GB Ram (about 75GB reserved to ES via ES_MIN_MEM /
ES_MAX_MEM) and 12 cores, and SSD disks.
I am using a config of 1 shard and 0 replica (no cluster - this is a
single, isolated machine).

My aim is to make the aggregation calculations perform as fast as possible.
Are there any recommendations for config setting for ES or the Indexes?

Another questions is if there is a way to silence the bulk indexing logs (I
am using Jörg Prante's JDBC plugin) to zero output? I was unable to find
the right setting to do that.

Thank you!
Ben

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a327d680-8917-41f2-83e3-ad013c94788a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

The logger name for bulk indexing in JDBC plugin is "NodeClient"

So if you want to mute the log messages, use something like

in config/logging.yml

logger:
NodeClient: OFF

maybe with "log4j", I don't know:

logger:
log4j:
NodeClient: OFF

or if you use log4j, in log4j.properties:

logger.log4j.NodeClient = OFF

or if you use log4j2, then in log4j2.xml

<?xml version="1.0" encoding="UTF-8"?>

Jörg

On Mon, Jul 28, 2014 at 11:06 AM, Ben tonkatsufan@gmail.com wrote:

Hi there,

I am using ES for calculating aggregations on a dataset of sales data
(about 50,000,000 docs or 10GB of data). As an example, I am using the date
histogram aggregation with term / sum sub-aggregations to get the sales sum
per day and product. There is a product_id, a date field, and a quantity
field among others.

This use case has no live indexing (!). I bulk-index the new sales data
once a day, shortly after midnight for the previous day only - during the
rest of the day, no new data is added. I also do not use any result sets
other than the aggregations results, so my result size is always set to 0
(zero) in queries.

My machine has 128GB Ram (about 75GB reserved to ES via ES_MIN_MEM /
ES_MAX_MEM) and 12 cores, and SSD disks.
I am using a config of 1 shard and 0 replica (no cluster - this is a
single, isolated machine).

My aim is to make the aggregation calculations perform as fast as
possible. Are there any recommendations for config setting for ES or the
Indexes?

Another questions is if there is a way to silence the bulk indexing logs
(I am using Jörg Prante's JDBC plugin) to zero output? I was unable to find
the right setting to do that.

Thank you!
Ben

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a327d680-8917-41f2-83e3-ad013c94788a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a327d680-8917-41f2-83e3-ad013c94788a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE2c87gB--edmu9S7qRZiyQj1qJhb%2BUVKZGy_LTDhRk7w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ben-2) #3

Hello Jörg,

thank you. I wasn't aware that the correct key is called "NodeClient". It
works.

Kind regards
Ben

On Monday, July 28, 2014 11:57:44 AM UTC+2, Jörg Prante wrote:

The logger name for bulk indexing in JDBC plugin is "NodeClient"

So if you want to mute the log messages, use something like

in config/logging.yml

logger:
NodeClient: OFF

maybe with "log4j", I don't know:

logger:
log4j:
NodeClient: OFF

or if you use log4j, in log4j.properties:

logger.log4j.NodeClient = OFF

or if you use log4j2, then in log4j2.xml

<?xml version="1.0" encoding="UTF-8"?>

Jörg

On Mon, Jul 28, 2014 at 11:06 AM, Ben <tonka...@gmail.com <javascript:>>
wrote:

Hi there,

I am using ES for calculating aggregations on a dataset of sales data
(about 50,000,000 docs or 10GB of data). As an example, I am using the date
histogram aggregation with term / sum sub-aggregations to get the sales sum
per day and product. There is a product_id, a date field, and a quantity
field among others.

This use case has no live indexing (!). I bulk-index the new sales data
once a day, shortly after midnight for the previous day only - during the
rest of the day, no new data is added. I also do not use any result sets
other than the aggregations results, so my result size is always set to 0
(zero) in queries.

My machine has 128GB Ram (about 75GB reserved to ES via ES_MIN_MEM /
ES_MAX_MEM) and 12 cores, and SSD disks.
I am using a config of 1 shard and 0 replica (no cluster - this is a
single, isolated machine).

My aim is to make the aggregation calculations perform as fast as
possible. Are there any recommendations for config setting for ES or the
Indexes?

Another questions is if there is a way to silence the bulk indexing logs
(I am using Jörg Prante's JDBC plugin) to zero output? I was unable to find
the right setting to do that.

Thank you!
Ben

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a327d680-8917-41f2-83e3-ad013c94788a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a327d680-8917-41f2-83e3-ad013c94788a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b75acf8-cc06-4454-8f8b-3e0959c33f9d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4