I am using ES for calculating aggregations on a dataset of sales data
(about 50,000,000 docs or 10GB of data). As an example, I am using the date
histogram aggregation with term / sum sub-aggregations to get the sales sum
per day and product. There is a product_id, a date field, and a quantity
field among others.
This use case has no live indexing (!). I bulk-index the new sales data
once a day, shortly after midnight for the previous day only - during the
rest of the day, no new data is added. I also do not use any result sets
other than the aggregations results, so my result size is always set to 0
(zero) in queries.
My machine has 128GB Ram (about 75GB reserved to ES via ES_MIN_MEM /
ES_MAX_MEM) and 12 cores, and SSD disks.
I am using a config of 1 shard and 0 replica (no cluster - this is a
single, isolated machine).
My aim is to make the aggregation calculations perform as fast as possible.
Are there any recommendations for config setting for ES or the Indexes?
Another questions is if there is a way to silence the bulk indexing logs (I
am using Jörg Prante's JDBC plugin) to zero output? I was unable to find
the right setting to do that.
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a327d680-8917-41f2-83e3-ad013c94788a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.