Want to query combine bool and regexp

My DSL is

{
  "size": 10, 
   "query":{
     "bool": {
       "should": [
        {"match_phrase_prefix": {"status" : "3"}  },
        {"match_phrase_prefix": {"status" : "4"}  }
        ]
  }
},
"aggs": {
    "host": {
      "terms": {
        "field": "status.keyword",
        "size": 10
      }
    }
  }
}

get http status 4xx and 3xx

"aggregations" : {
    "host" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "304",
          "doc_count" : 173161
        },
        {
          "key" : "403",
          "doc_count" : 28004
        },
        {
          "key" : "404",
          "doc_count" : 6682
        },
        {
          "key" : "302",
          "doc_count" : 6356
        },
        {
          "key" : "499",
          "doc_count" : 4760
        },
        {
          "key" : "400",
          "doc_count" : 16
        },
        {
          "key" : "301",
          "doc_count" : 3
        },
        {
          "key" : "408",
          "doc_count" : 1
        }
      ]
    }
  }
}

And an other regexp DSL to query all *.php value

{
  "size": 10, 
   "query":{
     "regexp": {
       "request":{
         "value": ".*php",
         "flags": "ALL"
       }
  }
},
"aggs": {
    "host": {
      "terms": {
        "field": "request.keyword",
        "size": 10
      }
    }
  }
}

result is

  "aggregations" : {
    "host" : {
      "doc_count_error_upper_bound" : 724,
      "sum_other_doc_count" : 291350,
      "buckets" : [
        {
          "key" : "GET /web_system_check.php HTTP/1.0",
          "doc_count" : 382717
        },
        {
          "key" : "HEAD /activity_link.php HTTP/1.1",
          "doc_count" : 133765
        },
        {
          "key" : "GET /six/forward_game.php?game_id=19001&lang=zh-cn&html5=1 HTTP/1.1",
          "doc_count" : 22236
        },
        {
          "key" : "GET /six/forward_game.php?game_id=19001&lang=zh-cn HTTP/1.1",
          "doc_count" : 15320
        },
        {
          "key" : "POST /six/ebet_verify.php HTTP/1.1",
          "doc_count" : 11878
        },
        {
          "key" : "GET /six/forward_game.php?game_id=54001&lang=zh-cn HTTP/1.1",
          "doc_count" : 10412
        },
        {
          "key" : "GET /six/forward_game_onewallet.php?game_id=66005&lang=zh-cn HTTP/1.1",
          "doc_count" : 10402
        },
        {
          "key" : "GET /six/forward_game.php HTTP/1.1",
          "doc_count" : 3652
        },
        {
          "key" : "GET /six/forward_game.php?game_id=36001&lang=zh-cn&html5=1 HTTP/1.1",
          "doc_count" : 3267
        },
        {
          "key" : "POST /six/wallet.php HTTP/1.1",
          "doc_count" : 3211
        }
      ]
    }
  }
}

I want to combine two of DSL query.

How should i do. :slightly_smiling_face:

A bool query can be nested inside another bool so you can have (for example) a regexp and a bool inside the must array.

bool
    must
        bool
            should
                 match1
                 match2
        regexp

Thanks your reply,

Now my query DSL use

bool
    must
        bool
            must_not
                match1
            filter
                regexp

and works good.

DSL is

GET access-2020.06.03/_search
{
  "size": 10,
  "query": {
    "bool": {
      "must": {
        "bool": {
          "must_not": [
            {
              "match": {
                "status": "200"
              }
            }
          ],
          "filter": {
            "regexp": {
              "request": {
                "value": ".*php.*",
                "flags": "ALL"
              }
            }
          }
        }
      }
    }
  },
  "aggs": {
    "host": {
      "terms": {
        "field": "request.keyword",
        "size": 10
      }
    }
  }
}

I have another DSL question

GET myindex-06.04/_search
{
  "size": 10,
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase_prefix": {
            "status": "5"
          }
        },
        {
          "match_phrase_prefix": {
            "status": "4"
          }
        }
      ],
      "filter": {
        "regexp": {
          "request": {
            "value": ".*php.*",
            "flags": "ALL"
          }
        }
      }
    }
  },
  "aggs": {
    "host": {
      "terms": {
        "field": "status.keyword",
        "size": 10
      }
    }
  }
}

result is

"aggregations" : {
    "host" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "200",
          "doc_count" : 10241706
        },
        {
          "key" : "499",
          "doc_count" : 127500
        },
        {
          "key" : "302",
          "doc_count" : 77599
        },
        {
          "key" : "502",
          "doc_count" : 52547
        },
        {
          "key" : "504",
          "doc_count" : 115
        },
        {
          "key" : "404",
          "doc_count" : 30
        },
        {
          "key" : "408",
          "doc_count" : 1
        }
      ]
    }
  }

Why I still got the http status 200 and http status 302 in my doc_count ?
:thinking:

Lucene is not a binary matching tool, it doesn't think of things as right or wrong, matched or not. Things are allowed to match to a degree with optional pieces of criteria. This is where should comes in - these are optional extras that are nice-to-haves that go along with the stricter must-have and must-not-have type bool clauses. This is what you have in your example with the should clause appearing alongside the strict filter clause.
When you have no strict clauses in a bool expression the should clause is promoted from being optional - at least one of the clauses it contains must match. So to implement strict OR boolean logic you need a bool expression with a should array of the choices and nothing else.

OK I got you.

Thanks Mark!! :smile:

I try another strict DSL expression.

Now the query meet the requirement :smiling_face_with_three_hearts:


GET access-2020.06.04/_search
{
  "query": {
    "bool": {
      "must": {
        "bool": {
          "must_not": [
            {
              "match_phrase_prefix": {
                "status": "3"
              }
            },
            {
              "match_phrase_prefix": {
                "status": "2"
              }
            }
          ],
          "must": [
            {
              "regexp": {
                "request": {
                  "value": ".*php",
                  "flags": "ALL"
                }
              }
            }
          ],
          "filter": [
            {
              "range": {
                "@timestamp": {
                  "gt": "now-1m"
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "host": {
      "terms": {
        "field": "status.keyword",
        "size": 10
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.