Getting users using most ip with faceted query

Hello,

I'm pushing to Elasticsearch via Logstash my Apache Log where i have a lot
of usefull infos like users using http auth, ip and request.
I would love to use faceted query to check number of ips used per user, and
sort it using facets to obtain a top 10.
I created a gist : https://gist.github.com/4269916 where you can see my
data and a query i run actually to check numbers of ip used per user.
But for now, i have to run one query per user, not very optimized.

If you have any idea, do not hesitate to share it :slight_smile:

Thanks

--

I also am using logstash, I recently tackled this same issue, though I am
not 100% of what you want for the output...I think all users with top 10
IPs each.

I found that if you are running facet query only query the fields you need,
and if you only want the counts then no fields.

For example (count of all the unique IPs for a "day" (index) being just
counts empty fields. Also the facet size will be the number of IPs, it
needs to be big enough so the facets "other" is zero for have all the IPs.
curl -X GET
"http://localhost:9200/logstash-2012.12.12/apache_access/_search?pretty=true"
-d
'{"facets":{"myfacet":{"terms":{"field":"@fields.client_ip","size":999999,"all_terms":"false"}}},"fields":[""]}'

If you want all the users, then you could do the same just replace ip with
user.

If you want IPs for each user, then you could loop the the facet terms in
your case "users" for the above search and use each for a new query for
that user, to obtain the IPs (terms) and count of each (if facet size is
10, then you'll get the top 10). Make sure your ip field is
"not_anaylized" (in the mapping). I use a similar report of https status
codes and top 20 urls for each.

BTW, facets are default sort by "count"

On Wednesday, December 12, 2012 12:49:17 PM UTC-5, Loïc Bertron wrote:

Hello,

I'm pushing to Elasticsearch via Logstash my Apache Log where i have a lot
of usefull infos like users using http auth, ip and request.
I would love to use faceted query to check number of ips used per user,
and sort it using facets to obtain a top 10.
I created a gist : Apache logs and Faceted query · GitHub where you can see my
data and a query i run actually to check numbers of ip used per user.
But for now, i have to run one query per user, not very optimized.

If you have any idea, do not hesitate to share it :slight_smile:

Thanks

--

Hey.
Thanks for your answer.
But if you take a look at the gist i attached to my message, that's exactly
what i'm doing now.
But it's not really optimized if i have to loop through all my clients.

Loïc

Le mercredi 12 décembre 2012 22:58:54 UTC-5, Kubes a écrit :

I also am using logstash, I recently tackled this same issue, though I am
not 100% of what you want for the output...I think all users with top 10
IPs each.

I found that if you are running facet query only query the fields you
need, and if you only want the counts then no fields.

For example (count of all the unique IPs for a "day" (index) being just
counts empty fields. Also the facet size will be the number of IPs, it
needs to be big enough so the facets "other" is zero for have all the IPs.
curl -X GET "
http://localhost:9200/logstash-2012.12.12/apache_access/_search?pretty=true"
-d
'{"facets":{"myfacet":{"terms":{"field":"@fields.client_ip","size":999999,"all_terms":"false"}}},"fields":[""]}'

If you want all the users, then you could do the same just replace ip with
user.

If you want IPs for each user, then you could loop the the facet terms in
your case "users" for the above search and use each for a new query for
that user, to obtain the IPs (terms) and count of each (if facet size is
10, then you'll get the top 10). Make sure your ip field is
"not_anaylized" (in the mapping). I use a similar report of https status
codes and top 20 urls for each.

BTW, facets are default sort by "count"

On Wednesday, December 12, 2012 12:49:17 PM UTC-5, Loïc Bertron wrote:

Hello,

I'm pushing to Elasticsearch via Logstash my Apache Log where i have a
lot of usefull infos like users using http auth, ip and request.
I would love to use faceted query to check number of ips used per user,
and sort it using facets to obtain a top 10.
I created a gist : Apache logs and Faceted query · GitHub where you can see my
data and a query i run actually to check numbers of ip used per user.
But for now, i have to run one query per user, not very optimized.

If you have any idea, do not hesitate to share it :slight_smile:

Thanks

--