BitSet.or consumes almost 60% cpu

That was late last night, i forget to tell u this. We also suspected that there was a problem with the cache at the beginning, so I did two things to to verify my idea.

  1. POST /myindex/_cache/clear, after I cleaned the cache of the index, the slow node is still using too much cpu.

  2. I did a benchmark test against the two nodes, 5 clients, each repeated 2000 times of the query as follows. Still, the slow node got higher usage of cpu and the flame graph was very similar to the one above.

POST myindex/_search?request_cache=false
{
  "_source": false,
  "profile":true,
  "docvalue_fields": [
    "v_spu_id",
    "brand_store_sn",
    "goods_cate_id_1",
    "goods_cate_id_2",
    "goods_cate_id_3",
    "product_tags_pdc",
    "product_tags_ptp",
    "product_tags_vde",
    "product_tags_usual_ptp",
    "spu_vip_self",
    "goods_inner_cate_id_3"
  ],
  "stored_fields": "_none_",
  "from": 0,
  "size": 270,
  "track_total_hits": true,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "filter": [
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "must": [
                        {
                          "terms": {
                            "product_props": [
                             
                              "学前儿童_V",
                              "幼儿_V",
                              "男女童_V",
                              "儿童_V",
                              "学龄前儿童_V",
                              "中童_V",
                              "低龄儿童_V",
                              "学童_V",
                              "大童_V",
                              "小男童_V",
                              "女童_V",
                              "婴童_V",
                              "中大童_V",
                              "早期儿童_V",
                              "小童_V"
                            ]
                          }
                        },
                        {
                          "terms": {
                            "product_props": [
                              "贵妇_V",
                              "中年妇女_V",
                              "轻熟女_V",
                              "女宝宝_V",
                              "少女_V",
                              "母女_V",
                              "闺女_V",
                              "淑女_V",
                              "年青人女性_V",
                              "仙女_V",
                              "女友_V",
                              "女王_V",
                              "女_V",
                              "年青人女士_V",
                              "仕女_V",
                              "女童_V",
                              "年青人女子_V",
                              "熟女_V",
                              "年青人女人_V",
                              "ol_V",
                              "少妇_V",
                              "家庭主妇_V"
                            ]
                          }
                        },
                        {
                          "terms": {
                            "product_all": [
                              "A_帆布鞋_V",
                              "A_雪地鞋_V",
                              "A_工装靴_V",
                              "A_旱冰鞋_V",
                              "A_冰刀鞋_V",
                              "A_皮靴_V",
                              "A_篮球鞋_V",
                              "A_鞋_V",
                              "A_羽毛球鞋_V",
                              "A_皮鞋_V",
                              "A_小白鞋_V",
                              "A_网球鞋_V",
                              "A_运动鞋_V",
                              "A_旗鞋_V",
                              "A_乐福鞋_V",
                              "A_饼干鞋_V",
                              "A_舞蹈鞋_V",
                              "A_马靴_V",
                              "A_棉拖_V",
                              "A_玛丽珍鞋_V",
                              "A_棒球鞋_V",
                              "A_解放鞋_V",
                              "A_越野鞋_V",
                              "A_一脚蹬_V",
                              "A_便鞋_V",
                              "A_乒乓球鞋_V",
                              "A_凉拖_V",
                              "A_短靴_V",
                              "A_人字拖_V",
                              "A_徒步鞋_V",
                              "A_正装鞋_V",
                              "A_硫化鞋_V",
                              "A_秋鞋_V",
                              "A_钉鞋_V",
                              "A_阿甘鞋_V",
                              "A_雨鞋_V",
                              "A_猫爪鞋_V",
                              "A_马丁靴_V",
                              "A_军靴_V",
                              "A_单靴_V",
                              "A_切尔西靴_V",
                              "A_溯溪鞋_V",
                              "A_登山鞋_V",
                              "A_舞鞋_V",
                              "A_松糕鞋_V",
                              "A_蛙鞋_V",
                              "A_小黑鞋_V",
                              "A_胶鞋_V",
                              "A_罗马鞋_V",
                              "A_长靴_V",
                              "A_袜靴_V",
                              "A_网鞋_V",
                              "A_水靴_V",
                              "A_健步鞋_V",
                              "A_战靴_V",
                              "A_滑板鞋_V",
                              "A_冰鞋_V",
                              "A_僧侣鞋_V",
                              "A_冰球鞋_V",
                              "A_军鞋_V",
                              "A_德比鞋_V",
                              "A_马丁鞋_V",
                              "A_单鞋_V",
                              "A_洛克鞋_V",
                              "A_板鞋_V",
                              "A_布鞋_V",
                              "A_波鞋_V",
                              "A_球鞋_V",
                              "A_瓢鞋_V",
                              "A_凉鞋_V",
                              "A_保暖鞋_V",
                              "A_豆豆鞋_V",
                              "A_洞沿鞋_V",
                              "A_棉鞋_V",
                              "A_雪地靴_V",
                              "A_灯鞋_V",
                              "A_熊猫鞋_V",
                              "A_草鞋_V",
                              "A_气垫鞋_V",
                              "A_靴_V",
                              "A_跑步鞋_V",
                              "A_勃肯鞋_V",
                              "A_拖鞋_V",
                              "A_足球鞋_V",
                              "A_婚鞋_V",
                              "A_渔夫鞋_V",
                              "A_椰子鞋_V",
                              "A_跳跃鞋_V",
                              "A_排球鞋_V",
                              "A_袜鞋_V"
                            ]
                          }
                        },
                        {
                          "match_all": {}
                        }
                      ]
                    }
                  }
                ]
              }
            }
          ]
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "static_score_9",
            "factor": 1,
            "missing": 1
          }
        }
      ],
      "score_mode": "first",
      "boost_mode": "replace"
    }
  }
}

Hopefully without the profile:true setting? It makes sense to use that to diagnose a single query's behaviour but it's expensive to run so not suited for load testing.

Even so, the two nodes should behave equally badly and not differ.
If you're certain the data/hardware/config/queries and caches are the same it's not clear what else may be going on so I asked the performance testing team here for ideas.
They suggested looking at the hardware ie dmesg outputs - one engineer had experience of a support case where the physical host was being thermally throttled.

Our SA turned to the /var/log/message at the very beginning when sth. went wrong, but no luck.
Everything about the hardware is good.

My mistake, the profile was opened during the stress test. :sweat_smile:
I should do the test again.

However, the slow node has turned to normal since yesterday after we isolated it for two days.
During the isolation, we did nothing special except my benchmark test. After that, we left it alone for two days and then put it into production again. Now it's completely fine, like nothing had happened.
We can't reproduce this problem any more..

Glad it's OK but always unsatisfying to not know why.
Thanks for sharing all the info.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.