How to do sequence matching


(Smitha Gowda) #1

Hello Elastic Search Experts,

I have recently started to look at Elastic Search to answer my analytics
questions. I am fairly new to ES query, so please excuse me if this a
obvious question

I have session data with events like below JSON. The Number of events and
order can vary in each document. I would like to GET all sessions with a
specific sequence of events. For below example, Sequence of Event
represented like "ABC". This document would be matched with any regular
expression query like *, A, BC, ABC.. . Non matching queries for the
below data D, AC, AC..
How do I represent the Sequence property below to be indexed properly for
the search? Or Am I thinking this data representation totally wrong?
How do I build a query in ES for this?

Any pointers will be helpful. Thanks.

Session :
{
StartTime:"20130101T01:00"
EndTime:"20130101T04:00"
Sequence:
Events: [
{
Name: "A"
StartTime:"20130101T01:00"
EndTime:"20130101T02:00"
},
{
Name: "B"
StartTime:"20130101T02:30"
EndTime:"20130101T03:00"
},
{
Name: "C"
StartTime:"20130101T03:30"
EndTime:"20130101T04:00"
}
]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/391e80ca-c774-4bf5-990a-39ac0b5e3c45%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #2

Hello Smitha ,

You can try the wild card query -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html#query-dsl-wildcard-query

Thanks
Vineeth

On Thu, Aug 21, 2014 at 8:51 AM, Smitha Gowda smithagowda@gmail.com wrote:

Hello Elastic Search Experts,

I have recently started to look at Elastic Search to answer my analytics
questions. I am fairly new to ES query, so please excuse me if this a
obvious question

I have session data with events like below JSON. The Number of events and
order can vary in each document. I would like to GET all sessions with a
specific sequence of events. For below example, Sequence of Event
represented like "ABC". This document would be matched with any regular
expression query like *, A, BC, ABC.. . Non matching queries for the
below data D, AC, AC..
How do I represent the Sequence property below to be indexed properly for
the search? Or Am I thinking this data representation totally wrong?
How do I build a query in ES for this?

Any pointers will be helpful. Thanks.

Session :
{
StartTime:"20130101T01:00"
EndTime:"20130101T04:00"
Sequence:
Events: [
{
Name: "A"
StartTime:"20130101T01:00"
EndTime:"20130101T02:00"
},
{
Name: "B"
StartTime:"20130101T02:30"
EndTime:"20130101T03:00"
},
{
Name: "C"
StartTime:"20130101T03:30"
EndTime:"20130101T04:00"
}
]
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/391e80ca-c774-4bf5-990a-39ac0b5e3c45%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/391e80ca-c774-4bf5-990a-39ac0b5e3c45%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kXqRECS224-m7UCLQb3yHzx0Pwt41cwyj%2B-0MRNNbo5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Smitha Gowda) #3

Thanks that will work.

One more question related to Kibana to visualize this data.

For a query that matches sequence "AB"
Once I have all the matching documents I want to plot a bar chart with
x-axis: Session StartTime (Day granularity)
y-axis: Mean of (LastEvent.EndTime(In this example B) -
FirstEvent.StartTime(In this Example A)) for the given day

Any pointers on how do I aggregate on other properties on the matched
document?

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/320375d3-a0f6-402e-92a0-08279b1f7c7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #4

Hello Smitha ,

Please be more elaborate.
What is the sequence AB , what is event here and what is last and first
event.

Thanks
Vineeth

On Fri, Aug 22, 2014 at 8:16 AM, Smitha Gowda smithagowda@gmail.com wrote:

Thanks that will work.

One more question related to Kibana to visualize this data.

For a query that matches sequence "AB"
Once I have all the matching documents I want to plot a bar chart with
x-axis: Session StartTime (Day granularity)
y-axis: Mean of (LastEvent.EndTime(In this example B) -
FirstEvent.StartTime(In this Example A)) for the given day

Any pointers on how do I aggregate on other properties on the matched
document?

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/320375d3-a0f6-402e-92a0-08279b1f7c7c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/320375d3-a0f6-402e-92a0-08279b1f7c7c%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMedq5ta5G%2BvX4Hzi19PyY6s9Kc2JFL-2mrkca_mvbww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Smitha Gowda) #5

Sure.
Going back to my original example

This is a session document containing a sequence of events occurring in
specific interval

Session :
{
StartTime:"20130101T01:00"
EndTime:"20130101T04:00"
Sequence: "A B C"
Events: [
{
Name: "A"
StartTime:"20130101T01:00"
EndTime:"20130101T02:00"
},
{
Name: "B"
StartTime:"20130101T02:30"
EndTime:"20130101T03:00"
},
{
Name: "C"
StartTime:"20130101T03:30"
EndTime:"20130101T04:00"
}
]
}

What I want

  1. Match all the documents having a specific sequence of events, say "B C"
  2. On the result, bucket aggregate documents by day on Session.StartTime
    (Date_Histogram)
  3. on each bucket find the average of time elapsed in seconds between the
    searched sequence. Here it was "B, C", so it will be
    session.Events[indexOfC].EndTime - session.Events[indexOfB].StartTime

I tried bucket filter aggregation on #1, seems to be working
I tried date_histogram for #2, not working,* I am not sure how to consume
the result of #1 in #2*
I have not reached to trying #3 because #2 is not working, but I think I
need avg aggr with script value.

Can you help with syntax or pointer on highlighted. I am also interested
in how I feed it to a Kibana chart.

Thanks in advance.

On Fri, Aug 22, 2014 at 2:26 AM, vineeth mohan vm.vineethmohan@gmail.com
wrote:

Hello Smitha ,

Please be more elaborate.
What is the sequence AB , what is event here and what is last and first
event.

Thanks
Vineeth

On Fri, Aug 22, 2014 at 8:16 AM, Smitha Gowda smithagowda@gmail.com
wrote:

Thanks that will work.

One more question related to Kibana to visualize this data.

For a query that matches sequence "AB"
Once I have all the matching documents I want to plot a bar chart with
x-axis: Session StartTime (Day granularity)
y-axis: Mean of (LastEvent.EndTime(In this example B) -
FirstEvent.StartTime(In this Example A)) for the given day

Any pointers on how do I aggregate on other properties on the matched
document?

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/320375d3-a0f6-402e-92a0-08279b1f7c7c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/320375d3-a0f6-402e-92a0-08279b1f7c7c%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0DT9B499joU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMedq5ta5G%2BvX4Hzi19PyY6s9Kc2JFL-2mrkca_mvbww%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMedq5ta5G%2BvX4Hzi19PyY6s9Kc2JFL-2mrkca_mvbww%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACyOFkRDH_6B_r0XJ0st%3Dx0wS7Z%3DOsdqgGHghqAsO-N%3DFvSS1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6