Searching IP Address with regex

Hi All,
I need to search for the presence of ip addresses on application logs in the following format:

The field could also contain information other than the ip address

I wrote the regex (https?://([0-9]{1,3}\.){3}[0-9]{1,3}) and tested it via regex101.
I created a test index to verify the search, inserted in the DSL query returns no results (via Kibana) :

{ 
  "regexp": {
    "message": {
      "case_insensitive": true,
      "value": "https?://([0-9]{1,3}\.){3}[0-9]{1,3}"
    }
  }
}

If I put:

  • "https?": returns document
  • "([0-9]{1,3}{3}[0-9]{1,3}": returns documents
  • "https?:": does not return documents
  • "https?://([0-9]{1,3}{3}[0-9]{1,3}": does not return documents
  • "https?:\\/\\/([0-9]{1,3}{3}[0-9]{1,3}": does not return documents
  • "https?\\:\\/\\/([0-9]{1,3}{3}[0-9]{1,3}": does not return documents

Can anyone help me? Currently the elastic stack in use is at version 8.11.1.

Thanks

It depends on the mapping used for the message field I think.
If it's not a keyword or a wildcard type, that might not work.

See Keyword type family | Elasticsearch Guide [8.12] | Elastic for more details.

Hi,
thanks for your reply!
I performed another test by creating an index with a wildcard field and the regex does not work.

If the mapping is wrong I can't explain why the first regex works but as soon as I insert the character ":" it stops working.

Thanks

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Have a look at the Elastic Stack and Solutions Help · Forums and Slack | Elastic page. It contains also lot of useful information on how to ask for help.

Hi,
please find below my test script.

Thanks you very much!

# Create index
PUT /regex-test
{
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "message": {
        "type": "text"
      }
    }
  }
}
 
# Populate index with some test cases
POST /regex-test/_doc
{
  "@timestamp": "2024-03-18T13:00:00",
  "message": "Correct case https://192.168.1.1/"
}

POST /regex-test/_doc
{
  "@timestamp": "2024-03-18T13:00:00",
  "message": "Correct https://192.168.1.1/second/third"
}

POST /regex-test/_doc
{
  "@timestamp": "2024-03-18T13:00:00",
  "message": "Correct case: http://192.168.1.1"
}

POST /regex-test/_doc
{
  "@timestamp": "2024-03-18T13:00:00",
  "message": "No correct case: 192.168.1.1/"
}

POST /regex-test/_doc
{
  "@timestamp": "2024-03-18T13:00:00",
  "message": "No correct https://www.google.it"
}

# Result found
GET /regex-test/_search
{
  "query": {
    "regexp": {
      "message": {
        "case_insensitive": true,
        "value": "([0-9]{1,3}\\.){3}[0-9]{1,3}"
      }
    }
  }
}

# Result found
GET /regex-test/_search
{
  "query": {
    "regexp": {
      "message": {
        "case_insensitive": true,
        "value": "https?"
      }
    }
  }
}

# Result NOT found
GET /regex-test/_search
{
  "query": {
    "regexp": {
      "message": {
        "case_insensitive": true,
        "value": "https?:"
      }
    }
  }
}

# Result NOT found
GET /regex-test/_search
{
  "query": {
    "regexp": {
      "message": {
        "case_insensitive": true,
        "value": "https?://([0-9]{1,3}\\.){3}[0-9]{1,3}"
      }
    }
  }
}

What about this?

# Create index
DELETE /regex-test

PUT /regex-test
{
  "mappings": {
    "properties": {
      "message": {
        "type": "wildcard"
      }
    }
  }
}
 
# Populate index with some test cases
POST /regex-test/_doc
{
  "message": "Correct case https://192.168.1.1/"
}

POST /regex-test/_doc
{
  "message": "Correct https://192.168.1.1/second/third"
}

POST /regex-test/_doc
{
  "message": "Correct case: http://192.168.1.1"
}

POST /regex-test/_doc
{
  "message": "No correct case: 192.168.1.1/"
}

POST /regex-test/_doc
{
  "message": "No correct https://www.google.it"
}

# Result found
GET /regex-test/_search
{
  "query": {
    "regexp": {
      "message": {
        "case_insensitive": true,
        "value": "([0-9]{1,3}\\.){3}[0-9]{1,3}"
      }
    }
  }
}

# Result found
GET /regex-test/_search
{
  "query": {
    "regexp": {
      "message": {
        "case_insensitive": true,
        "value": ".*https?.*"
      }
    }
  }
}

# Result NOT found
GET /regex-test/_search
{
  "query": {
    "regexp": {
      "message": {
        "case_insensitive": true,
        "value": ".*https?:.*"
      }
    }
  }
}

# Result NOT found
GET /regex-test/_search
{
  "query": {
    "regexp": {
      "message": {
        "case_insensitive": true,
        "value": ".*https?://([0-9]{1,3}\\.){3}[0-9]{1,3}"
      }
    }
  }
}

Thank you so much @dadoonet ! It works now!

I only changed the regex in ".https?://([0-9]{1,3}\.){3}[0-9]{1,3}.".
And now the mapping of the message field must be updated (from text to wildcard/keyword).

Thank you again.
Have a nice day

Final Solution:

# Create index
DELETE /regex-test

PUT /regex-test
{
  "mappings": {
    "properties": {
      "message": {
        "type": "wildcard"
      }
    }
  }
}
 
# Populate index with some test cases
POST /regex-test/_doc
{
  "message": "Correct case https://192.168.1.1/"
}

POST /regex-test/_doc
{
  "message": "Correct https://192.168.1.1/second/third"
}

POST /regex-test/_doc
{
  "message": "Correct case: http://192.168.1.1"
}

POST /regex-test/_doc
{
  "message": "No correct case: 192.168.1.1/"
}

POST /regex-test/_doc
{
  "message": "No correct https://www.google.it"
}

# Query
GET /regex-test/_search
{
  "query": {
    "regexp": {
      "message": {
        "case_insensitive": true,
        "value": ".*https?://([0-9]{1,3}\\.){3}[0-9]{1,3}.*"
      }
    }
  }
}