How to limit fields of response doc when I search certain keyword?


(纪路) #1

Dear all:
There is a reasonable need, but I don't find a solve in official doc or
book, can you help me?

I have a large set of docs, which contains a lot of fields, such as:
{
"id": 1404999597,
"idstr": "1404999597",
"class": 1,
"screen_name": "主播梦桐",
"name": "主播梦桐",
"province": "11",
"city": "1000",
"location": "北京",
"description": "在主流与非主流之间徘徊",
"url": "",
"profile_image_url": "http://tp2.sinaimg.cn/1404999597/50/5642385629/0",
"profile_url": "u/1404999597",
"domain": "",
"weihao": "",
"gender": "f",
"followers_count": 1030710,
"friends_count": 272,
"statuses_count": 1519,
"favourites_count": 90,
"created_at": "Wed Mar 23 23:59:40 +0800 2011",
"following": false,
"allow_all_act_msg": false,
"geo_enabled": false,
"verified": true,
"verified_type": 0,
"remark": "",
"status": {
"created_at": "Tue Jul 01 13:17:55 +0800 2014",
"id": 3727513249206064,
"mid": "3727513249206064",
"idstr": "3727513249206064",
"text": "听到她的声音,我更相信她和荷西在天堂,依旧幸福着。 //@东方尔雅:现在这种纯真的爱情还好找吗? //@晓玲-有话说:[心]",
"source": "<a href="http://app.weibo.com/t/feed/9ksdit"
rel="nofollow">iPhone客户端",
"favorited": false,
"truncated": false,
"in_reply_to_status_id": "",
"in_reply_to_user_id": "",
"in_reply_to_screen_name": "",
"pic_urls": [],
"geo": null,
"reposts_count": 0,
"comments_count": 0,
"attitudes_count": 0,
"mlevel": 0,
"visible": {
"type": 0,
"list_id": 0
},
"darwin_tags": []
},
"ptype": 1,
"allow_all_comment": true,
"avatar_large": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0",
"avatar_hd": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0",
"verified_reason": "电视台主持人梦桐",
"verified_trade": "",
"verified_reason_url": "",
"verified_source": "",
"verified_source_url": "",
"follow_me": false,
"online_status": 0,
"bi_followers_count": 167,
"lang": "zh-cn",
"star": 0,
"mbtype": 0,
"mbrank": 0,
"block_word": 0,
"block_app": 0,
"ability_tags": "主持人",
"worldcup_guess": 0
}

My problem is when I search(or scan & scroll) a certain field, for example
"city"=1000(1000 is its city code, which refer to a city name), there maybe
10000 results are returned. But my goal is detect how gender of this city's
person is distributed in my website, I don't need so many information
except "gender" field. What method can I do for excluding meaningless data
from the response JSON before they are returned? Because there are so many
similar tasks for me, transmitting the entire doc will spend lots of time
and bandwidth, and I have to trim the additional date in myself program, it
also wast CPU time in local computer. So if you know how to deal with this
need, pleas teach it to my. Thank you!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a01d5f4-67a5-493a-8e35-6f9a40a9998b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

If I understand you correctly, you want to view the distribution of gender
based on the results of a query? In that case, you want to look into
aggregations, which work on top of the result set that is returned.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/aggregations.html

Here is a query that should work with your basic use case. Substitute
aggregations for facets if you have a newer version of Elasticsearch.

{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"city": "1000"
}
}
}
},
"facets": {
"gender": {
"terms": {
"field": "gender"
}
}
}
}

--
Ivan

On Sat, Jul 5, 2014 at 2:12 AM, 纪路 magievanerv@gmail.com wrote:

Dear all:
There is a reasonable need, but I don't find a solve in official doc or
book, can you help me?

I have a large set of docs, which contains a lot of fields, such as:
{
"id": 1404999597,
"idstr": "1404999597",
"class": 1,
"screen_name": "主播梦桐",
"name": "主播梦桐",
"province": "11",
"city": "1000",
"location": "北京",
"description": "在主流与非主流之间徘徊",
"url": "",
"profile_image_url": "http://tp2.sinaimg.cn/1404999597/50/5642385629/0",
"profile_url": "u/1404999597",
"domain": "",
"weihao": "",
"gender": "f",
"followers_count": 1030710,
"friends_count": 272,
"statuses_count": 1519,
"favourites_count": 90,
"created_at": "Wed Mar 23 23:59:40 +0800 2011",
"following": false,
"allow_all_act_msg": false,
"geo_enabled": false,
"verified": true,
"verified_type": 0,
"remark": "",
"status": {
"created_at": "Tue Jul 01 13:17:55 +0800 2014",
"id": 3727513249206064,
"mid": "3727513249206064",
"idstr": "3727513249206064",
"text": "听到她的声音,我更相信她和荷西在天堂,依旧幸福着。 //@东方尔雅:现在这种纯真的爱情还好找吗? //@晓玲-有话说:[心]",
"source": "<a href="http://app.weibo.com/t/feed/9ksdit"
rel="nofollow">iPhone客户端",
"favorited": false,
"truncated": false,
"in_reply_to_status_id": "",
"in_reply_to_user_id": "",
"in_reply_to_screen_name": "",
"pic_urls": [],
"geo": null,
"reposts_count": 0,
"comments_count": 0,
"attitudes_count": 0,
"mlevel": 0,
"visible": {
"type": 0,
"list_id": 0
},
"darwin_tags": []
},
"ptype": 1,
"allow_all_comment": true,
"avatar_large": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0",
"avatar_hd": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0",
"verified_reason": "电视台主持人梦桐",
"verified_trade": "",
"verified_reason_url": "",
"verified_source": "",
"verified_source_url": "",
"follow_me": false,
"online_status": 0,
"bi_followers_count": 167,
"lang": "zh-cn",
"star": 0,
"mbtype": 0,
"mbrank": 0,
"block_word": 0,
"block_app": 0,
"ability_tags": "主持人",
"worldcup_guess": 0
}

My problem is when I search(or scan & scroll) a certain field, for example
"city"=1000(1000 is its city code, which refer to a city name), there maybe
10000 results are returned. But my goal is detect how gender of this city's
person is distributed in my website, I don't need so many information
except "gender" field. What method can I do for excluding meaningless data
from the response JSON before they are returned? Because there are so many
similar tasks for me, transmitting the entire doc will spend lots of time
and bandwidth, and I have to trim the additional date in myself program, it
also wast CPU time in local computer. So if you know how to deal with this
need, pleas teach it to my. Thank you!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2a01d5f4-67a5-493a-8e35-6f9a40a9998b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2a01d5f4-67a5-493a-8e35-6f9a40a9998b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCYFj8LGp%2B1jTaER10DrPbGVcbfnatkm8%2BNrOEvzbqfaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3