Extracting substring from a field

A field called message contains a lot of information as below.

message: xxxxxxxx /app/rest/abc 200 500ms xxxxxxxx. -> is there a way to extract the api names and the response time only and put them in a table (without changing the original ingestion).
It appears runtime field is an option. Do we need to create a separate index for that?

Hi @SUMANTA_ROY

Couple questions...

What version are you on?

Curious if these logs are from a common app like Apache or Nginx?

Do you just want these fields to show up in Discover or do you want to filter or aggregate etc... how else do you want to use?

There are a couple options but answer these questions first.

Also if you could provide a better more complete log

Thanks for your response.

Version is 7.17.x.
Logs are access logs - not from common app.

I am planning to capture these details and then create some table visualization like:

api. - response code - response time
/abc - 200 - 500 ms
/xyz - 500 - 450 ms

Also, want to create alert based on them in future.
I tried to create runtime field using grok.
But getting the error -

"...Text fields are not optimized for operation that require per-document field data like aggregation and sorting, so these operations are disabled by default. ....."

Log looks like:
some fields and their value -> followed by ->
message: [timestamp] ***** "POST /app/rest/abc?xxxx HTTP/1.1" 200 676866 678ms xxxxxx tailed_path: xxxxx

So, looking for extracting substring from "message" field

Hi @SUMANTA_ROY

In general, you would create an ingest pipeline and parse it when the logs come in. This has several advantages: scale, speed, and ease of use.

Yes, you can do it as runtime fields, but it will not be very efficient.

There are newer features, such as ESQL in recent versions that would help greatly ...

But given your current situation, you will probably want to do a grok, perhaps more especially if you want to parse those paths and only get the last element, etc.

I think there is an example here of composite

Or you would have to create 3 separate runtime fields through the data view... each extracting the fields you want...

There is some work to do for sure... if I get a chance later I will see if I can take a quick pass at it.

Appreciate your quick response.

A few follow up queries:

  1. Does ESQL allows to run regex directly on the Discover search bar and allow to create field runtime ? Which version supports that?

  2. For runtime field extraction - it appears - performance is a concern while trying to do such operation on a text field. I received below error as mentioned in previous post.
    Is there any work around?
    Will runtime field creation needs to be created under a different index?

"...Text fields are not optimized for operation that require per-document field data like aggregation and sorting, so these operations are disabled by default. ....."

Yes it does... it is quite powerfull...
It came out as Tech Preview in 8.10
8.14 is GA

So that depends on your mapping show this and we can see

GET your-index/_mapping/field/message

Depending on that you may end up using ...

.extract(params._source.message)) << if you do not have a keyword subfield
or
.extract(doc[\"message.keyword\"].value))" << if you have a .keyword subfield

See Here

In most cases, retrieve field values through doc_values whenever possible. Accessing doc_values with a runtime field is faster than retrieving values from _source because of how data is loaded from Lucene.

However, there are cases where retrieving fields from _source is necessary. For example, text fields do not have doc_values available by default, so you have to retrieve values from _source. In other instances, you might choose to disable doc_values on a specific field.

You can alternatively prefix the field you want to retrieve values for with params._source (such as params._source.day_of_week). For simplicity, defining a runtime field in the mapping definition without a script is the recommended option, whenever possible.

No they are created at the data view or mapping level

Example here is pretty close... I just have not got the last "s right

PUT discuss-test/_mappings
{
  "runtime": {
    "http": {
      "type": "composite",
      "script": "emit(grok(\"\\\\[%{DATA:timestamp}\\\\] %{DATA:host} %{DATA:verb} %{DATA:url} %{DATA:http_version} %{DATA:response_code} %{DATA:bytes} %{DATA:response_time} %{GREEDYDATA:message_details}\").extract(params._source.message))",
      "fields": {
        "url": {
          "type": "keyword"
        },
        "verb": {
          "type": "keyword"
        },
        "response_code": {
          "type": "keyword"
        }
      }
    }
  }
}


POST discuss-test/_doc
{
     "@timestamp": "2024-07-07T19:01:42.736Z",
     "message" : """[Jul 7, 2024 @ 12:01:41] abcdef "POST /app/rest/xyz?somethingselase HTTP/1.1" 500 676866 200ms idontknow tailed_path: /path"""
}


GET discuss-test/_search
{
  "_source": ["*"], 
  "fields": [
    "*"
  ]
}

# Results just need to get the last quotes

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "discuss-test",
        "_type" : "_doc",
        "_id" : "az8Wj5ABfwsjeNV6XlhS",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2024-07-07T19:01:42.736Z",
          "message" : """[Jul 7, 2024 @ 12:01:41] abcdef "POST /app/rest/xyz?somethingselase HTTP/1.1" 500 676866 200ms idontknow tailed_path: /path"""
        },
        "fields" : {
          "http.response_code" : [
            "500"
          ],
          "@timestamp" : [
            "2024-07-07T19:01:42.736Z"
          ],
          "http.verb" : [
            "\"POST"
          ],
          "message.keyword" : [
            """[Jul 7, 2024 @ 12:01:41] abcdef "POST /app/rest/xyz?somethingselase HTTP/1.1" 500 676866 200ms idontknow tailed_path: /path"""
          ],
          "http.url" : [
            "/app/rest/xyz?somethingselase"
          ],
          "message" : [
            """[Jul 7, 2024 @ 12:01:41] abcdef "POST /app/rest/xyz?somethingselase HTTP/1.1" 500 676866 200ms idontknow tailed_path: /path"""
          ]
        }
      }
    ]
  }
}