Thank you Aaron.
For my first question, before this
all the log content are in a message, I'm not sure how to "parse" them by now.
I think I could use the filter:grok to do something like this, with one of the 120 grok-patterns.
filter {
grok {
match => { "message" => "%{USERNAME:user_id}" }
add_field => { "user_id" => "user_%{user_id}" }
}
}
But in fact, there are several fields have similar pattern. {user_id1:123456, user_id2:456123, video_id:123456, act:play} which means user1 plays user2's video.
Now my input is json from redis like this,
2017-04-23T16:20:31+08:00 cv.product.access.mobile {"race":"album","video_id":43633036,"ip":"117.177.78.48","cdn":"cdn-web-qn.colorv.cn","act":"update","ad_type":"AdExchange","agent":"Mozilla/5.0 (Linux; Android 5.1.1; vivo X6SPlus D Build/LMY47V; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.49 Mobile MQQBrowser/6.2 TBS/043128 Safari/537.36 MicroMessenger/6.5.7.1041 NetType/WIFI Language/zh_CN","author_zone":0,"author_registered_at":"2015-08-02 13:06:05","post_id":0,"published_at":"","referer":"","reference_id":0,"sessid":"7e620e610447414bafe5091470f8b0b2","duration":144,"author_is_priest":0,"download_type":"myapp","author_udid":"d850a4a042d6382","status_404":"","author_version":"and-3.6.13-gdt","mold_id":10006,"url":"http://video.colorv.cn/play/43633036?from=timeline&isappinstalled=0&from=share","author_os":"and","page_kind":"mini","request_id":"ff6925b4c86b4d34be534a6609edfa2d","referrer_id":"","author_id":3934438,"play_time":60,"method":"GET","published":0}
How can I distinguish them and extract each key into a field?
And for my second question,
- We will have two applications: one for BI in kibana, the other is construct a recommender system.
- Our log data for each user will be large, [quote="theuntergeek, post:2, topic:83719"]
millions of unique indices with few records in them
[/quote]I mean, every click and request in our app(making and sharing short video).
So our plan is to create index for each user and video, use timestamp field as filter when quering in BI or collect training sample in recommender system.
Your mean this plan will performance bad,but if we index by date, we will have many users, videos, acts in a single "date-index", is it hard for ES to find a record we need?
So what is the trade-off here, how to decide the index structure according to our problem?