I have aggregated accesslog data from different webservers in a large
logstash index. My goal is to get the page visits out of the accesslog
hits.
A visit is defined as following: A visit results out of one or more hits
from a single ip address in a specific time frame. Due to different
products on the webservers each domain should be considered separately.
My questions are:
Can this problem already be solved with build-in elasticsearch
features? If yes, how?
If no:
What kind of plugin would you suggest?
My own considerations lead from building a custom filter to retrieve just
the data I need, to build a plugin which analyses the accesslog index and
put the visit-data into a new index.
Maybe someone can help me? I appreciate every answer. Thank you for your
time!
I have aggregated accesslog data from different webservers in a large
logstash index. My goal is to get the page visits out of the accesslog
hits.
A visit is defined as following: A visit results out of one or more
hits from a single ip address in a specific time frame. Due to different
products on the webservers each domain should be considered separately.
My questions are:
Can this problem already be solved with build-in elasticsearch
features? If yes, how?
If no:
What kind of plugin would you suggest?
My own considerations lead from building a custom filter to retrieve just
the data I need, to build a plugin which analyses the accesslog index and
put the visit-data into a new index.
Maybe someone can help me? I appreciate every answer. Thank you for your
time!
Yes, I'm using kibana as well. Out of kibana i can manually extract this
data, but the problem is that a SQL like "group by domain, ip" is not
really doable on a large index. As far as I know anything with grouping
involved is done internally with facets, which doesn't respect any kind of
time filter.
Am Sonntag, 6. Juli 2014 11:28:22 UTC+2 schrieb Mark Walkom:
Are you using kibana? You should be able to extract this pretty simply if
you are, if not, check it out.
On 6 July 2014 19:12, Stefan Hasenstab <b00...@gmail.com <javascript:>>
wrote:
Problem:
I have aggregated accesslog data from different webservers in a large
logstash index. My goal is to get the page visits out of the accesslog
hits.
A visit is defined as following: A visit results out of one or more
hits from a single ip address in a specific time frame. Due to different
products on the webservers each domain should be considered separately.
My questions are:
Can this problem already be solved with build-in elasticsearch
features? If yes, how?
If no:
What kind of plugin would you suggest?
My own considerations lead from building a custom filter to retrieve just
the data I need, to build a plugin which analyses the accesslog index and
put the visit-data into a new index.
Maybe someone can help me? I appreciate every answer. Thank you for your
time!
You probably want to aggregate by the page and latter aggregate by time or
the oposite, what best suites your needs.
On Sunday, July 6, 2014 9:08:03 AM UTC-3, Stefan wrote:
Yes, I'm using kibana as well. Out of kibana i can manually extract this
data, but the problem is that a SQL like "group by domain, ip" is not
really doable on a large index. As far as I know anything with grouping
involved is done internally with facets, which doesn't respect any kind of
time filter.
Am Sonntag, 6. Juli 2014 11:28:22 UTC+2 schrieb Mark Walkom:
Are you using kibana? You should be able to extract this pretty simply if
you are, if not, check it out.
I have aggregated accesslog data from different webservers in a large
logstash index. My goal is to get the page visits out of the
accesslog hits.
A visit is defined as following: A visit results out of one or more
hits from a single ip address in a specific time frame. Due to different
products on the webservers each domain should be considered separately.
My questions are:
Can this problem already be solved with build-in elasticsearch
features? If yes, how?
If no:
What kind of plugin would you suggest?
My own considerations lead from building a custom filter to retrieve
just the data I need, to build a plugin which analyses the accesslog index
and put the visit-data into a new index.
Maybe someone can help me? I appreciate every answer. Thank you for your
time!
Ah nice, this looks exactly like what i need. But what is about memory
consideration? The problem about histogram facets was, that all related
data has to be loaded into memory, which is horrible if you want to group
big data.
( Elasticsearch Platform — Find real-time answers at scale | Elastic
). Do you know how the new aggregation feature works internally?
Am Sonntag, 6. Juli 2014 15:15:50 UTC+2 schrieb Antonio Augusto Santos:
You probably want to aggregate by the page and latter aggregate by time or
the oposite, what best suites your needs.
On Sunday, July 6, 2014 9:08:03 AM UTC-3, Stefan wrote:
Yes, I'm using kibana as well. Out of kibana i can manually extract this
data, but the problem is that a SQL like "group by domain, ip" is not
really doable on a large index. As far as I know anything with grouping
involved is done internally with facets, which doesn't respect any kind of
time filter.
Am Sonntag, 6. Juli 2014 11:28:22 UTC+2 schrieb Mark Walkom:
Are you using kibana? You should be able to extract this pretty simply
if you are, if not, check it out.
I have aggregated accesslog data from different webservers in a large
logstash index. My goal is to get the page visits out of the
accesslog hits.
A visit is defined as following: A visit results out of one or more
hits from a single ip address in a specific time frame. Due to different
products on the webservers each domain should be considered separately.
My questions are:
Can this problem already be solved with build-in elasticsearch
features? If yes, how?
If no:
What kind of plugin would you suggest?
My own considerations lead from building a custom filter to retrieve
just the data I need, to build a plugin which analyses the accesslog index
and put the visit-data into a new index.
Maybe someone can help me? I appreciate every answer. Thank you for
your time!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.