Why do terms_stats split my key_field "url"?


(Chenryn) #1

hello everyone.
I use elasticsearch to store my access.log.and I want to get count/avg etc group by request url.
the JSON to index as follow(created by logstash):
{
"@timestamp" : "2012-09-04T13:38:59.496888Z",
"@tags" : [],
"@fields" : {
"reqtime" : [
0.016
],
"req" : [
"/fmn056/20120812/1645/tiny_r3N9_236f000036ad118d.jpg"
],
"version" : [
"1.1"
],
"useragent" : [
""Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)""
],
"port" : [
"80"
],
"size" : [
2360
],
"client" : [
"210.56.223.176"
],
"upstream" : [
"10.9.18.50"
],
"method" : [
"GET"
],
"referer" : [
"photo.renren.com",
"/photo/420723228/photo-6408408309?psource=3&fromVIP=false"
],
"ZONE" : [
"+0800"
],
"code" : [
200
],
"upstime" : [
0.016
]
},
"@source_path" : "//data/nginx/logs/access.log",
"@source" : "file://DBLYD5-32.opi.com//data/nginx/logs/access.log",
"@message" : "[04/Sep/2012:21:38:59 +0800] 200 210.56.223.176 fmn.rrimg.com GET /fmn056/20120812/1645/tiny_r3N9_236f000036ad118d.jpg HTTP/1.1 10.9.18.50:80 0.016 0.016 2360 "http://photo.renren.com/photo/420723228/photo-6408408309?psource=3&fromVIP=false" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" "-"",
"@source_host" : "DBLYD5-32.opi.com",
"@type" : "nginx"
}

And I wrote request as follow(use perl module ElasticSearch.pm):

$elsearch->search(
index => 'logstash',
type => 'nginx',
query => {
text => { code => '200' }
},
facets => {
"request" => {
"terms_stats" => {
"value_field" => "reqtime",
"key_field" => "req",
}
}
}
);

But I got a splited response like:
'terms' => [
{
'count' => 740364,
'min' => '0.016',
'max' => '0.016',
'mean' => '0.0159999999999977',
'total' => '11845.8239999983',
'total_count' => 740364,
'term' => 'tiny_r3n9_236f000036ad118d.jpg'
},
{
'count' => 740364,
'min' => '0.016',
'max' => '0.016',
'mean' => '0.0159999999999977',
'total' => '11845.8239999983',
'total_count' => 740364,
'term' => 'fmn056'
},
{
'count' => 740364,
'min' => '0.016',
'max' => '0.016',
'mean' => '0.0159999999999977',
'total' => '11845.8239999983',
'total_count' => 740364,
'term' => '20120812'
},
{
'count' => 740364,
'min' => '0.016',
'max' => '0.016',
'mean' => '0.0159999999999977',
'total' => '11845.8239999983',
'total_count' => 740364,
'term' => '1645'
}
],
the "req_url" is splited with '/'.
Anyone can help me?


(Chenryn) #2

well, I learn analyzer now.And after reindex with analyzer:"whitespace", I can terms_stat whole "url" field now.
But, can I change analyzer of the exist index mapper? And can I disable all analyzer by configure? Because I can't change the create post in logstash.


(system) #3