Evaluating ElasticSearch: Is it possible to run multiple value aggregations on ~100M records?


(Mex) #1

I am currently evaluating a set of technologies we hope to use in order to
drive real-time analytic queries on a large dataset, I came across
ElasticSearch and was wondering on its appropriateness for our use cases.

Consider a dataset of ~100 Million records, where each record is
represented as followed:

{
"Country": "Canada",
"City": "Toronto",
"Business Type": "Retailer",
"Store Name": "Costco",
"Owners Name": "John",
"Owners Age": 57,
"Store Opened": "06/05/1999",
"Store Shut": "10/06/2013",
"nEmployees: 70,
"Avg Daily Customers": "1,300",
"Avg Daily Sales": "252,000"
}

The values of *'Avg Daily Customers' *and 'Avg Daily Sales' are
the ones by which all queries need to be aggregated. It might be useful to
share some of the flexible style queries our product is expected to address
in near real-time (< 2 seconds) response times:

  • Calculate the total sum of Daily Customers and Sales for any store in
    Toronto which has less than 80 Employees and which Opened after 01/01/2000
  • *Calculate the *total sum of Daily Customers and Sales grouped by
    Business Type for any store in Canada, Mexico, USA
  • *Calculate the **total sum of Daily Customers and Sales *for any store
    in *Toronto *grouped by nEmployees *and sorted by Owners Age *

This is all new land for me and the members of our team meaning we have
been hitting our heads hard with various technologies trying to get this
queries to perform. Based on your practical experience with ElasticSearch,
are these sort of queries something you consider the technology to be a
good fit for?

It would be fantastic if you could also share some assumptions such as the
type of hardware I might need to satisfy such times. While I understand
ElasticSearch is infinitely scalable horizontally, having 100 server
instances might not be the right move for us, please keep that in mind :slight_smile:

Thanks!

-Mex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b56c6727-dbc3-4066-890c-3528aa909844%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mex) #2

An suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b9e71fa9-9453-4d82-9eb6-4213b74389be%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mex) #3

Any Ideas, Advice, Suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f1c4289-6d4f-4047-bd4c-94dc2e4237d6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #4

You'll probably want to look at ES 1.0 and aggregations:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html

About your questions (roughly in pseudocode):

  1. Bool filter (Employees lt 80, Opened gt 1/1/200), Sum Aggregation
    (Script=doc["Daily Customers"].value + doc["Sales"].value)

  2. Terms Filter (Store in [Canada, Mexico, USA), Terms Aggregation (Field =
    Business Type), Nest Sum Aggregation (Script=doc["Daily Customers"].value +
    doc["Sales"].value)

  3. Term Filter (Store = Toronto), Terms (or maybe Range, depending on what
    you want) Aggregation (Field = nEmployees) and Order (by some statistic
    computed on Age per Term/Bucket), Nest Sum Aggregation (Script=doc["Daily
    Customers"].value + doc["Sales"].value)

Your documents seem fairly small so storage should not be an issue. You'll
probably want to get as much RAM as you can because aggregations will
require them. I'd suggest to do some tests on a smaller set to get a feel
for how much RAM you'd eventually need in production.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b643fbb-98f7-4df8-b2df-b6d7a1cd3c7a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mex) #5

Thanks a lot for the insights Binh Ly

Quick clarification, when you mean "as much RAM as you can" what are the
typical requirements / recommended settings for elasticseach?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/31217ba0-4d26-4228-a172-4297b283c28c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #6

You can get some details by watching this video (but exact RAM requirements
vary by the nature of your data and queries):

http://www.elasticsearch.org/webinars/elasticsearch-pre-flight-checklist/?watch=1

A general guideline is to give ES (ES_MAX_MEM) half of your total RAM.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/21aaaa44-bc04-4b57-a3d4-bccacfcb5549%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mex) #7

Thanks Binh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/16274731-689f-4675-8d87-4b01c1ec0299%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8