Extract @datetime field from a string in an index pattern and make it filterable using datetime

The orignal task is to make an index pattern (here alb-logs*) in Elasticsearch searchable in datetime field (get those histogram).

So I approached this problem and found out that this index does not have any datetime field; but from what I can see that there is a log field and its second index item in the string is a TIMESTAMP_ISO8601 field sandwiched between some text in string value.

ex value in log:
"log": [
"http 2022-06-23T08:05:09.703732Z app/jupyter-notebook-alb/5369d658dabf1dc5 104.217.249.182:34438 - -1 -1 -1 301 - 331 329 "GET http://13.126.211.168:80/ HTTP/1.1" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0" - - - "Root=1-62b41eb5-4ee4ff335df664a832aedc0f" "-" "-" 1 2022-06-23T08:05:09.481000Z "redirect" "[https://13.126.211.168:443/\](https://13.126.211.168/\)" "-" "-" "-" "-" "-""
]

So I followed up by learning Ingest pipeline and used grok filter with pattern "%{WORD:word}%{SPACE}%{TIMESTAMP_ISO8601:TIMEDATE}". This did extracted the local datetime field for me.

For extra security I even followed up this field with the DATE processor . The final request looks like this:

PUT _ingest/pipeline/abs_datetime_pipeline
{
  "description": "This extracts datetime field from the log.",
  "processors": [
    {
      "grok": {
        "field": "log",
        "patterns": [
          "%{WORD:word}%{SPACE}%{TIMESTAMP_ISO8601:TIMEDATE}"
        ]
      }
    },
    {
      "date": {
        "field": "TIMEDATE",
        "formats": [
          "ISO8601"
        ]
      }
    }
  ]
}

This extracted @datetime field well from a sample test document I tested upon.

Then went for index-management for alb-logs and did set their index.default_pipeline and index.final_pipeline to the new abs_datetime_pipeline.

Now I expected that in my home Discover section for alb-logs, histogram wrt datetime might occur but sadly it didn't
Can you point out what I am missing and where I am getting it wrong Or what I can do to solve the task.

What is the mapping you are using for your index?

if i run GET /alb-logs/_mapping

  "alb-logs" : {
    "mappings" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "TIMEDATE" : {
          "type" : "date"
        },
        "log" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "word" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

I'd use the default @timestamp field:

PUT _ingest/pipeline/abs_datetime_pipeline
{
  "description": "This extracts datetime field from the log.",
  "processors": [
    {
      "grok": {
        "field": "log",
        "patterns": [
          "%{WORD:word}%{SPACE}%{TIMESTAMP_ISO8601:@timestamp}"
        ]
      }
    },
    {
      "remove": {
        "field": "log"
      }
    },
    {
      "date": {
        "field": "@timestamp",
        "formats": [
          "ISO8601"
        ]
      }
    }
  ]
}

And set the mapping like this:

"alb-logs" : {
    "mappings" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "word" : {
          "type" : "text"
        }
      }
    }
  }
}

I do now see the new fields for that index pattern. However, most of the field value is empty (It didn't get the extracted data), though some are getting it. To be accurate, 13,127,000 hits out of 569,432,116 hits. (Approx 2%). Is this how I should expect the result, as there is still no histogram initially present, and also, the number of hits is not increasing?

You need to provide more details like a way to reproduce the problem and what is the exact line.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.