Hi All,
I am very new into this elasticsearch and I could not find any related discussion on this.
Currently, I am using elasticsearch to perform analytic for our data. I migrated our data from the RDBMS to the elasticsearch via logstash.
I have example data in such format in the index userLogs and type log with _id = (id from the RDBMS )
Index : userLogs/logs/{id}
{ [ { action : 'click', page : 'product_page', date : '2016-02-02', userId : 'userId1', user : { name : 'Jason', age : 20, country : 'Singapore', ethnicity : 'Chinese' }, { action : 'click', date : '2016-02-03', userId : 'userId2', user : { name : 'James', age : 23, country : 'Australia', ethnicity : 'Indian' }, { action : 'click', date : '2016-02-02', userId : 'userId1', user : { name : 'Jason', age : 20, country : 'Singapore', ethnicity : 'Chinese' }, ] }
From the data above, there will be duplicate of data as this is the logging of users. In one day, there would be multiple records of same user as long as the user is having activity.
Assuming the data have been accumulated for some time now, user demographic research needed to be perform across this data. Let say I want to search at the month of February , how many distinct users have been active under these data. Then from the distinct user, the demographic result of the user needed to be collected. For example, we would have 5,000 activity for that month, but only 500 distinct users out of those 5,000. Then I want the demographic of those 500 users. I know If i performed terms filter aggregation, I could have the results from the 5,000 instead of the 500 and this is not distinct result.
In shorts, I need to filter 5,000 data to only distinct 500 users in one bucket with their respective user data inside there and then doing aggregation on the 500 users with their demographic profiles count only such as age , country and ethnicity.
In RDBMS, i could perform CTE (common table expression) to group all those distinct user row into one table then perform aggregation such as select age, ethnicity, country, count(*) from CTE_distinct_users group by age, ethnicity , country
Any ways to achieve this result in elasticsearch? Please help