I'm working with the twitter river data. Trying to figure out how to
construct a query that would let me generate a count of all new users
within a given time period, where in this case, new means "user has not had
a post captured before the start of this query window".
So basically, I want to get a facet result of user.screen_name where a name
will be dropped from the facet entirely if an instance of it occurs outside
a specified time range.
I've got no idea where to start. Anyone have any pointers?
You can't really do that, not without doing some pre-calculations like
creating a user document for every user that has a tweet in the system or
similar.
On Fri, Jan 24, 2014 at 2:09 AM, Josh Harrison hijakk@gmail.com wrote:
I'm working with the twitter river data. Trying to figure out how to
construct a query that would let me generate a count of all new users
within a given time period, where in this case, new means "user has not had
a post captured before the start of this query window".
So basically, I want to get a facet result of user.screen_name where a
name will be dropped from the facet entirely if an instance of it occurs
outside a specified time range.
I've got no idea where to start. Anyone have any pointers?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.