Mozilla Metrics has created a five node ElasticSearch cluster on some
test machines that is using the Twitter River functionality to
automatically retrieve a filtered set of documents from Twitter's
streaming API and index them.
Just wanted to post some answers to the questions raised in the wiki that
we talked about on IRC:
The date key histogram facet field is returned as milliseconds since the
epoch. This is done for two reasons, first is performance / memory usage,
and the second is the fact that the format of the date is really user
defined, so you get the milliseconds since the epoch, and can easily create
a Date object around it and format it how you want.
total and mean are aggregation of the value field in the histogram facet.
If no value field is specified, then the key values are used. It can be a
script as well. It does not make a lot of sense in tweets, but imagine a
tweet has a price, you could get the total and avg price per day that way.
Usually, it is better to have the filters separated. For example, having
the filter be composed of an "and" filter, with a range one and a prefix
one.
Mozilla Metrics has created a five node ElasticSearch cluster on some
test machines that is using the Twitter River functionality to
automatically retrieve a filtered set of documents from Twitter's
streaming API and index them.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.