Using ElasticSearch for exploratory data analysis


(Nick C) #1

We are building an app to help us keep track of user behaviour on site,
similar to Google Analytics. We are logging the data currently, but are
looking for a good way to enable exploratory data analysis, i.e.
dynamically querying the data in real time to gain insight. I'm wondering
whether elasticsearch might be a good fit for this.

Our data structure is likely to look something like:

user
session
event
properties (key-value pairs)
event
properties
event
properties
session
event
properties

The type of queries we would like to perform are either aggregations
(count/avg/min/max etc.), or finding a list of matching users or events,
which don't sound like a problem.

The twist is that in addition to simple queries where we query or aggregate
based on user or event properties, it's likely that we'll also want to
perform set-based operations to filter out data. This would allow us to
express things such as 'Find all users who have performed event Y after
performing event X' or 'How many users have performed event X but haven't
performed event Y'.

Does elasticsearch have support for this type of query? Would it be a good
match for this type of application?

Thanks,
Nick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85540e20-b896-4a8e-a9aa-72121e9cf5ce%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #2

Hi Nick,

'How many users have performed event X but haven't performed event Y'

This question looks answerable to me by using a boolean filter with a MUST
clause to return users who performed X and a MUST_NOT clause to filter out
users who havent't performed Y.

'Find all users who have performed event Y after performing event X'

However I can't think of a way to answer this kind of question. The only
idea that comes to my mind would be to search for users who performed both
X and Y and to filter on client-side to only keep those who did Y after X
but this would only work fine when the returned number of documents is
small.

On Thu, Feb 13, 2014 at 3:02 AM, Nick C nick.curry@gmail.com wrote:

We are building an app to help us keep track of user behaviour on site,
similar to Google Analytics. We are logging the data currently, but are
looking for a good way to enable exploratory data analysis, i.e.
dynamically querying the data in real time to gain insight. I'm wondering
whether elasticsearch might be a good fit for this.

Our data structure is likely to look something like:

user
session
event
properties (key-value pairs)
event
properties
event
properties
session
event
properties

The type of queries we would like to perform are either aggregations
(count/avg/min/max etc.), or finding a list of matching users or events,
which don't sound like a problem.

The twist is that in addition to simple queries where we query or
aggregate based on user or event properties, it's likely that we'll also
want to perform set-based operations to filter out data. This would allow
us to express things such as 'Find all users who have performed event Y
after performing event X' or 'How many users have performed event X but
haven't performed event Y'.

Does elasticsearch have support for this type of query? Would it be a good
match for this type of application?

Thanks,
Nick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/85540e20-b896-4a8e-a9aa-72121e9cf5ce%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6ZcRSe7x1CpgTeDebaukC51CL5cTWKU3-cU9rPJK38iQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nick C) #3

Adrien,

Thanks for getting back - that's very helpful.

Cheers,
Nick

On 13 February 2014 14:35, Adrien Grand adrien.grand@elasticsearch.comwrote:

Hi Nick,

'How many users have performed event X but haven't performed event Y'

This question looks answerable to me by using a boolean filter with a MUST
clause to return users who performed X and a MUST_NOT clause to filter out
users who havent't performed Y.

'Find all users who have performed event Y after performing event X'

However I can't think of a way to answer this kind of question. The only
idea that comes to my mind would be to search for users who performed both
X and Y and to filter on client-side to only keep those who did Y after X
but this would only work fine when the returned number of documents is
small.

On Thu, Feb 13, 2014 at 3:02 AM, Nick C nick.curry@gmail.com wrote:

We are building an app to help us keep track of user behaviour on site,
similar to Google Analytics. We are logging the data currently, but are
looking for a good way to enable exploratory data analysis, i.e.
dynamically querying the data in real time to gain insight. I'm wondering
whether elasticsearch might be a good fit for this.

Our data structure is likely to look something like:

user
session
event
properties (key-value pairs)
event
properties
event
properties
session
event
properties

The type of queries we would like to perform are either aggregations
(count/avg/min/max etc.), or finding a list of matching users or events,
which don't sound like a problem.

The twist is that in addition to simple queries where we query or
aggregate based on user or event properties, it's likely that we'll also
want to perform set-based operations to filter out data. This would allow
us to express things such as 'Find all users who have performed event Y
after performing event X' or 'How many users have performed event X but
haven't performed event Y'.

Does elasticsearch have support for this type of query? Would it be a
good match for this type of application?

Thanks,
Nick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/85540e20-b896-4a8e-a9aa-72121e9cf5ce%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0-gyHSsTEGo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6ZcRSe7x1CpgTeDebaukC51CL5cTWKU3-cU9rPJK38iQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B%2Brc%3DZrPskNn%2BuDeDpSdwrmXTJga%3DNVgcamqb_FNMDx99Zh2g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Raed Marji) #4

Hello Nick,

I would love to hear some feedback on how did you go forward and whether
elasticsearch was the tool for the job and if not why.

Thanks

On Thursday, February 13, 2014 at 3:02:02 AM UTC+1, Nick C wrote:

We are building an app to help us keep track of user behaviour on site,
similar to Google Analytics. We are logging the data currently, but are
looking for a good way to enable exploratory data analysis, i.e.
dynamically querying the data in real time to gain insight. I'm wondering
whether elasticsearch might be a good fit for this.

Our data structure is likely to look something like:

user
session
event
properties (key-value pairs)
event
properties
event
properties
session
event
properties

The type of queries we would like to perform are either aggregations
(count/avg/min/max etc.), or finding a list of matching users or events,
which don't sound like a problem.

The twist is that in addition to simple queries where we query or
aggregate based on user or event properties, it's likely that we'll also
want to perform set-based operations to filter out data. This would allow
us to express things such as 'Find all users who have performed event Y
after performing event X' or 'How many users have performed event X but
haven't performed event Y'.

Does elasticsearch have support for this type of query? Would it be a good
match for this type of application?

Thanks,
Nick

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f540a6d3-7777-4b77-840b-7f625a2eb439%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5