Hi there,
I put Apache Log access to E.S for searching and do some statistics.
My logs example is something like this :
curl "localhost:9200/logstash-2013.07.16/_search?pretty" -d '{"size":4}'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 45,
"max_score" : 1.0,
"hits" : [ {
"_index" : "logstash-2013.07.16",
"_type" : "fluentd",
"_id" : "dCi2ALWDSj6tA4I0W6kZnQ",
"_score" : 1.0, "_source" :
{"host":"10.0.0.1","user":null,"method":"GET","path":"/somepath01/somefile01.html","code":200,"size":12,"referer":null,"agent":"curl/7.19.7
(x86_64-unknown-linux-gnu) libcurl/7.19.7 NSS/3.12.7.0 zlib/1.2.3
libidn/1.18 libssh2/1.2.2","@timestamp":"2013-07-16T11:25:01+07:00"}
}, {
"_index" : "logstash-2013.07.16",
"_type" : "fluentd",
"_id" : "udCTJUP8Sce5N-A5QSdCDw",
"_score" : 1.0, "_source" :
{"host":"10.0.0.2","user":null,"method":"GET","path":"/somepath02/somefile02.html","code":200,"size":12,"referer":null,"agent":"curl/7.19.7
(x86_64-unknown-linux-gnu) libcurl/7.19.7 NSS/3.12.7.0 zlib/1.2.3
libidn/1.18 libssh2/1.2.2","@timestamp":"2013-07-16T11:25:02+07:00"}
}, {
"_index" : "logstash-2013.07.16",
"_type" : "fluentd",
"_id" : "Yghj0YQtTYWaS_kGy8T7YQ",
"_score" : 1.0, "_source" :
{"host":"10.0.0.3","user":null,"method":"GET","path":"/somepath03/somefile03.html","code":200,"size":12,"referer":null,"agent":"curl/7.19.7
(x86_64-unknown-linux-gnu) libcurl/7.19.7 NSS/3.12.7.0 zlib/1.2.3
libidn/1.18 libssh2/1.2.2","@timestamp":"2013-07-16T11:25:03+07:00"}
}, {
"_index" : "logstash-2013.07.16",
"_type" : "fluentd",
"_id" : "WkrnKPd5Sc2qvE1fmRuEDA",
"_score" : 1.0, "_source" :
{"host":"10.0.0.4","user":null,"method":"GET","path":"/somepath04/somefile04.html","code":200,"size":12,"referer":null,"agent":"curl/7.19.7
(x86_64-unknown-linux-gnu) libcurl/7.19.7 NSS/3.12.7.0 zlib/1.2.3
libidn/1.18 libssh2/1.2.2","@timestamp":"2013-07-16T11:25:04+07:00"}
} ]
}
}
Now I want to list the top IPs (field:host) which occurs most in the log,
so I use the facets query :
curl "localhost:9200/logstash-2013.07.16/_search?pretty" -d
'{"facets":{"Top IP":{"terms":{"field":"host"}}}}'
"facets" : {
"Top IP" : {
"_type" : "terms",
"missing" : 0,
"total" : 45,
"other" : 0,
"terms" : [ {
"term" : "10.0.0.9",
"count" : 9
}, {
"term" : "10.0.0.8",
"count" : 8
}, {
"term" : "10.0.0.7",
"count" : 7
}, {
"term" : "10.0.0.6",
"count" : 6
}, {
"term" : "10.0.0.5",
"count" : 5
}, {
"term" : "10.0.0.4",
"count" : 4
}, {
"term" : "10.0.0.3",
"count" : 3
}, {
"term" : "10.0.0.2",
"count" : 2
}, {
"term" : "10.0.0.1",
"count" : 1
} ]
}
}
The result is perfect ! E.S did work well.
Now I want to do the same thing with field:path to list the top most
access-ed URL.
curl "localhost:9200/logstash-2013.07.16/_search?pretty" -d
'{"facets":{"Top URL":{"terms":{"field":"path"}}}}'
"facets" : {
"Top IP" : {
"_type" : "terms",
"missing" : 0,
"total" : 135,
"other" : 28,
"terms" : [ {
"term" : "html",
"count" : 45
}, {
"term" : "somepath09",
"count" : 9
}, {
"term" : "somefile09",
"count" : 9
}, {
"term" : "somepath08",
"count" : 8
}, {
"term" : "somepath07",
"count" : 7
}, {
"term" : "somefile08",
"count" : 7
}, {
"term" : "somepath06",
"count" : 6
}, {
"term" : "somefile07",
"count" : 6
}, {
"term" : "somepath05",
"count" : 5
}, {
"term" : "somefile06",
"count" : 5
} ]
}
}
This time, the result is weird. It should be like this : "term" : "
/somepath05/somefile05.html","count" : 5
I guest E.S have some errors with the forward slash "/" in the path field.
I don't know how to fix this.
Could you pro show me the problem and help me to fix this.
Many appreciates.
Atrus@
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.