Need some ideas: Getting visits from hits out of logstash index


(Stefan Hasenstab) #1

Problem:

I have aggregated accesslog data from different webservers in a large
logstash index. My goal is to get the page visits out of the accesslog
hits.

A visit is defined as following: A visit results out of one or more hits
from a single ip address in a specific time frame. Due to different
products on the webservers each domain should be considered separately.
My questions are:

  • Can this problem already be solved with build-in elasticsearch
    features? If yes, how?
  • If no:
    • What kind of plugin would you suggest?

My own considerations lead from building a custom filter to retrieve just
the data I need, to build a plugin which analyses the accesslog index and
put the visit-data into a new index.

Maybe someone can help me? I appreciate every answer. Thank you for your
time!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1abed157-cdc2-4e0f-b314-a954c20b89f2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

Are you using kibana? You should be able to extract this pretty simply if
you are, if not, check it out.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 July 2014 19:12, Stefan Hasenstab b00nb0b@gmail.com wrote:

Problem:

I have aggregated accesslog data from different webservers in a large
logstash index. My goal is to get the page visits out of the accesslog
hits.

A visit is defined as following: A visit results out of one or more
hits from a single ip address in a specific time frame. Due to different
products on the webservers each domain should be considered separately.
My questions are:

  • Can this problem already be solved with build-in elasticsearch
    features? If yes, how?
  • If no:
    • What kind of plugin would you suggest?

My own considerations lead from building a custom filter to retrieve just
the data I need, to build a plugin which analyses the accesslog index and
put the visit-data into a new index.

Maybe someone can help me? I appreciate every answer. Thank you for your
time!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1abed157-cdc2-4e0f-b314-a954c20b89f2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1abed157-cdc2-4e0f-b314-a954c20b89f2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624beaqys7wyXm_Ye5v37bPcZ9VROGV%2BSCLGh0MseWVsw9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Stefan Hasenstab) #3

Yes, I'm using kibana as well. Out of kibana i can manually extract this
data, but the problem is that a SQL like "group by domain, ip" is not
really doable on a large index. As far as I know anything with grouping
involved is done internally with facets, which doesn't respect any kind of
time filter.

Am Sonntag, 6. Juli 2014 11:28:22 UTC+2 schrieb Mark Walkom:

Are you using kibana? You should be able to extract this pretty simply if
you are, if not, check it out.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 6 July 2014 19:12, Stefan Hasenstab <b00...@gmail.com <javascript:>>
wrote:

Problem:

I have aggregated accesslog data from different webservers in a large
logstash index. My goal is to get the page visits out of the accesslog
hits.

A visit is defined as following: A visit results out of one or more
hits from a single ip address in a specific time frame. Due to different
products on the webservers each domain should be considered separately.
My questions are:

  • Can this problem already be solved with build-in elasticsearch
    features? If yes, how?
  • If no:
    • What kind of plugin would you suggest?

My own considerations lead from building a custom filter to retrieve just
the data I need, to build a plugin which analyses the accesslog index and
put the visit-data into a new index.

Maybe someone can help me? I appreciate every answer. Thank you for your
time!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1abed157-cdc2-4e0f-b314-a954c20b89f2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1abed157-cdc2-4e0f-b314-a954c20b89f2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/06924fcf-cd3e-4354-aa66-6e58428a9734%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Antonio Augusto Santos) #4

DateHistogram aggregation can generate buckets by
timeframe http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html

You probably want to aggregate by the page and latter aggregate by time or
the oposite, what best suites your needs.

On Sunday, July 6, 2014 9:08:03 AM UTC-3, Stefan wrote:

Yes, I'm using kibana as well. Out of kibana i can manually extract this
data, but the problem is that a SQL like "group by domain, ip" is not
really doable on a large index. As far as I know anything with grouping
involved is done internally with facets, which doesn't respect any kind of
time filter.

Am Sonntag, 6. Juli 2014 11:28:22 UTC+2 schrieb Mark Walkom:

Are you using kibana? You should be able to extract this pretty simply if
you are, if not, check it out.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 July 2014 19:12, Stefan Hasenstab b00...@gmail.com wrote:

Problem:

I have aggregated accesslog data from different webservers in a large
logstash index. My goal is to get the page visits out of the
accesslog hits.

A visit is defined as following: A visit results out of one or more
hits from a single ip address in a specific time frame. Due to different
products on the webservers each domain should be considered separately.
My questions are:

  • Can this problem already be solved with build-in elasticsearch
    features? If yes, how?
  • If no:
    • What kind of plugin would you suggest?

My own considerations lead from building a custom filter to retrieve
just the data I need, to build a plugin which analyses the accesslog index
and put the visit-data into a new index.

Maybe someone can help me? I appreciate every answer. Thank you for your
time!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1abed157-cdc2-4e0f-b314-a954c20b89f2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1abed157-cdc2-4e0f-b314-a954c20b89f2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6beae3e9-1f11-4e36-983b-42bc1bdb5e42%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Stefan Hasenstab) #5

Ah nice, this looks exactly like what i need. But what is about memory
consideration? The problem about histogram facets was, that all related
data has to be loaded into memory, which is horrible if you want to group
big data.
( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-histogram-facet.html#_memory_considerations_3
). Do you know how the new aggregation feature works internally?

Am Sonntag, 6. Juli 2014 15:15:50 UTC+2 schrieb Antonio Augusto Santos:

DateHistogram aggregation can generate buckets by timeframe
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html

You probably want to aggregate by the page and latter aggregate by time or
the oposite, what best suites your needs.

On Sunday, July 6, 2014 9:08:03 AM UTC-3, Stefan wrote:

Yes, I'm using kibana as well. Out of kibana i can manually extract this
data, but the problem is that a SQL like "group by domain, ip" is not
really doable on a large index. As far as I know anything with grouping
involved is done internally with facets, which doesn't respect any kind of
time filter.

Am Sonntag, 6. Juli 2014 11:28:22 UTC+2 schrieb Mark Walkom:

Are you using kibana? You should be able to extract this pretty simply
if you are, if not, check it out.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 July 2014 19:12, Stefan Hasenstab b00...@gmail.com wrote:

Problem:

I have aggregated accesslog data from different webservers in a large
logstash index. My goal is to get the page visits out of the
accesslog hits.

A visit is defined as following: A visit results out of one or more
hits from a single ip address in a specific time frame. Due to different
products on the webservers each domain should be considered separately.
My questions are:

  • Can this problem already be solved with build-in elasticsearch
    features? If yes, how?
  • If no:
    • What kind of plugin would you suggest?

My own considerations lead from building a custom filter to retrieve
just the data I need, to build a plugin which analyses the accesslog index
and put the visit-data into a new index.

Maybe someone can help me? I appreciate every answer. Thank you for
your time!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1abed157-cdc2-4e0f-b314-a954c20b89f2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1abed157-cdc2-4e0f-b314-a954c20b89f2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dac3a78f-579b-42b9-b1b6-c93900a542b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6