Prototype use of ElasticSearch Twitter River

Mozilla Metrics has created a five node ElasticSearch cluster on some
test machines that is using the Twitter River functionality to
automatically retrieve a filtered set of documents from Twitter's
streaming API and index them.

I've just created this page to document the first sample query:
https://wiki.mozilla.org/Army_of_Awesome/ElasticSearch_Prototype

I would love to get some feedback from the ES community on points/
questions made on that page and improvements / ideas for the query.

Hi,

Just wanted to post some answers to the questions raised in the wiki that
we talked about on IRC:

  1. The date key histogram facet field is returned as milliseconds since the
    epoch. This is done for two reasons, first is performance / memory usage,
    and the second is the fact that the format of the date is really user
    defined, so you get the milliseconds since the epoch, and can easily create
    a Date object around it and format it how you want.

  2. total and mean are aggregation of the value field in the histogram facet.
    If no value field is specified, then the key values are used. It can be a
    script as well. It does not make a lot of sense in tweets, but imagine a
    tweet has a price, you could get the total and avg price per day that way.

  3. Usually, it is better to have the filters separated. For example, having
    the filter be composed of an "and" filter, with a range one and a prefix
    one.

-shay.banon

On Wed, Nov 10, 2010 at 4:59 PM, Daniel E deinspanjer@gmail.com wrote:

Mozilla Metrics has created a five node Elasticsearch cluster on some
test machines that is using the Twitter River functionality to
automatically retrieve a filtered set of documents from Twitter's
streaming API and index them.

I've just created this page to document the first sample query:
Army of Awesome/ElasticSearch Prototype - MozillaWiki

I would love to get some feedback from the ES community on points/
questions made on that page and improvements / ideas for the query.