How to read custom log files using filebeat

I have following TCS.log in seperate folder

2021/06/13 17:58:42 :     INFO | Stock = TCS.NS,  Date = 2002-08-12
2021/06/13 17:58:42 :     INFO | Volume=212976
2021/06/13 17:58:42 :     INFO | Low=38.724998474121094
2021/06/13 17:58:42 :     INFO | High=40.0
2021/06/13 17:58:42 :     INFO | Open=38.724998474121094
2021/06/13 17:58:42 :     INFO | Close=39.70000076293945
2021/06/13 17:58:42 :     INFO | Stock = TCS.NS,  Date = 2002-08-13
2021/06/13 17:58:42 :     INFO | Volume=153576
2021/06/13 17:58:42 :     INFO | Low=38.875
2021/06/13 17:58:42 :     INFO | High=40.38750076293945
2021/06/13 17:58:42 :     INFO | Open=39.75
2021/06/13 17:58:42 :     INFO | Close=39.162498474121094
2021/06/13 17:58:42 :     INFO | Stock = TCS.NS,  Date = 2002-08-14
2021/06/13 17:58:42 :     INFO | Volume=822776
2021/06/13 17:58:42 :     INFO | Low=35.724998474121094
2021/06/13 17:58:42 :     INFO | High=39.25
2021/06/13 17:58:42 :     INFO | Open=39.25
2021/06/13 17:58:42 :     INFO | Close=36.462501525878906

Which will be updated on a daily basis.
I want to read and draw the graph(kibana visualisation ) for each of the stock(which will be per file.log) using each of the parameter , Open, Close, High, Low and Volume.

Here is filebeat configuration I created

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /home/home1/filebeatlogs/*.log
  multiline:
    pattern: 'Stock ='
    negate: true
    match: after
    flush_pattern: 'Close='



setup.template.settings:
  index.number_of_shards: 1
setup.kibana:
output.elasticsearch:
  hosts: ["localhost:9200"]

setup.template.name: "filebeat-*"
#setup.template.fields: "fields_simple.yml"
#setup.template.overwrite: true
type or paste code here

How to make the config to read OHLC per stock and get the visualisation both for OHLC and Volume per date.

Is it possible to use filebeat-> Elastic-> Kibana for this use case.

I am able to read the message per stock per date as message in elastic. But not able to parse the data and visualise in kibana.

How to go about it using filebeat.
I think i am sure i will be able to use logstash, but that will remain a issue, due to heavy processing and memory, as already there is plenty of heavy duty stuff on the server, is filebeat or anything else will provide a solution

Beats are light-weight data shipper, filebeat multi-line will be able to send one event for multiple lines to logstash or elasticsearch (all data in field message), you then need some grok to extract the fields from the message received
Grok can be done by logstash or by elasticsearch ingest node depending on where you want to bear the costs
Thanks
Julien

Thanks Julian,

I made the improvement, I am trying to have code read the data now,
it seems in working in console ,

POST /_ingest/pipeline/_simulate
{
"pipeline": {
"description": "Grok ingestion pipeline nginx logs",
"version": 0,
"processors": [
{
"grok": {
"field": "message",
"trace_match": true,
"patterns": [
"%{GREEDYDATA:irrelevant_data}Stock = %{GREEDYDATA:Stock}\n%{GREEDYDATA:irrelevant_data}Date = %{GREEDYDATA:date}\n%{GREEDYDATA:irrelevant_data}Volume=%{NUMBER:Volume}\n%{GREEDYDATA:irrelevant_data}Low=%{BASE10NUM:Low}\n%{GREEDYDATA:irrelevant_data}High=%{BASE10NUM:High}\n%{GREEDYDATA:irrelevant_data}Open=%{BASE10NUM:Open}\n%{GREEDYDATA:irrelevant_data}Close=%{BASE10NUM:Close}"
]
}
}

    ]
},
"docs": [
    {
        "_index": "filebeat-7.13.1-2021.06.13-000001",
        "_type": "_doc",
        "_id": "kMpUTHoBr7SFhhL5-98P",
        "_source": {
            "message": "2021/06/27 15:17:49 :     INFO | Stock = 0700.HK\n2021/06/27 15:17:49 :     INFO | Date = 2021-06-18\n2021/06/27 15:17:49 :     INFO | Volume=12301606\n2021/06/27 15:17:49 :     INFO | Low=599.5\n2021/06/27 15:17:49 :     INFO | High=608.0\n2021/06/27 15:17:49 :     INFO | Open=602.0\n2021/06/27 15:17:49 :     INFO | Close=603.0"
        }
    }
]

}

ingested teh pipeline too, but final problem i got it, i am not able to change the Date type (Second line in patterns) as date, it is still string, so not able to properly visualize in kibana.

The output which i got using the given GROK pattern is of string, If i need to change the outputtype as Date and Long and Double, how to go about it.

Have you created a mapping That should take care of the primitive types like integers and longs

Then you should take a look at the date processor to convert your date.

hi @stephenb , To understand what you mean, is I should have another processor for Converting to Date.

I did that one too, but i did not observe the fields
to explain in a better way
I have ohlc_all_fields which converts message to string as follows
"patterns": [
"%{GREEDYDATA:irrelevant_data}Stock = %{GREEDYDATA:Stock}\n%{GREEDYDATA:irrelevant_data}Date = %{GREEDYDATA:date}\n%{GREEDYDATA:irrelevant_data}Volume=%{NUMBER:Volume}\n%{GREEDYDATA:irrelevant_data}Low=%{BASE10NUM:Low}\n%{GREEDYDATA:irrelevant_data}High=%{BASE10NUM:High}\n%{GREEDYDATA:irrelevant_data}Open=%{BASE10NUM:Open}\n%{GREEDYDATA:irrelevant_data}Close=%{BASE10NUM:Close}"

then i had another processor to change only the date as busDate , but busDate does not show as fields.

More over, in 7.13, not sure , how can i use pipeline injection with targetted field, within grok

Try this... You may want to put the timezone in the date processor.

POST /_ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "Grok ingestion pipeline nginx logs",
    "version": 0,
    "processors": [
      {
        "grok": {
          "field": "message",
          "trace_match": true,
          "patterns": [
            """%{GREEDYDATA:irrelevant_data}Stock = %{GREEDYDATA:Stock}
%{GREEDYDATA:irrelevant_data}Date = %{GREEDYDATA:date}
%{GREEDYDATA:irrelevant_data}Volume=%{NUMBER:Volume}
%{GREEDYDATA:irrelevant_data}Low=%{BASE10NUM:Low}
%{GREEDYDATA:irrelevant_data}High=%{BASE10NUM:High}
%{GREEDYDATA:irrelevant_data}Open=%{BASE10NUM:Open}
%{GREEDYDATA:irrelevant_data}Close=%{BASE10NUM:Close}"""
          ]
        }
      },
      {
        "date": {
          "field": "date",
          "target_field": "busDate", 
          "formats": ["yyyy-MM-dd"]
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "filebeat-7.13.1-2021.06.13-000001",
      "_id": "kMpUTHoBr7SFhhL5-98P",
      "_source": {
        "message": """2021/06/27 15:17:49 :     INFO | Stock = 0700.HK
2021/06/27 15:17:49 :     INFO | Date = 2021-06-18
2021/06/27 15:17:49 :     INFO | Volume=12301606
2021/06/27 15:17:49 :     INFO | Low=599.5
2021/06/27 15:17:49 :     INFO | High=608.0
2021/06/27 15:17:49 :     INFO | Open=602.0
2021/06/27 15:17:49 :     INFO | Close=603.0"""
      }
    }
  ]
}

Results note the busDate field, you will need to add that to your mapping as type date

{
  "docs" : [
    {
      "doc" : {
        "_index" : "filebeat-7.13.1-2021.06.13-000001",
        "_type" : "_doc",
        "_id" : "kMpUTHoBr7SFhhL5-98P",
        "_source" : {
          "date" : "2021-06-18",
          "High" : "608.0",
          "message" : """2021/06/27 15:17:49 :     INFO | Stock = 0700.HK
2021/06/27 15:17:49 :     INFO | Date = 2021-06-18
2021/06/27 15:17:49 :     INFO | Volume=12301606
2021/06/27 15:17:49 :     INFO | Low=599.5
2021/06/27 15:17:49 :     INFO | High=608.0
2021/06/27 15:17:49 :     INFO | Open=602.0
2021/06/27 15:17:49 :     INFO | Close=603.0""",
          "irrelevant_data" : "2021/06/27 15:17:49 :     INFO | ",
          "Open" : "602.0",
          "busDate" : "2021-06-18T00:00:00.000Z",
          "Volume" : "12301606",
          "Low" : "599.5",
          "Close" : "603.0",
          "Stock" : "0700.HK"
        },
        "_ingest" : {
          "_grok_match_index" : "0",
          "timestamp" : "2021-07-19T14:39:07.052238Z"
        }
      }
    }
  ]
}

Thanks Stephen, but I create the pipeline using " Ingest Node Pipelines"
not sure , how I can add the date processor in the same one.

Just as I showed you.. just add the date processor as another processor, an ingest pipeline can have many processor and they execute in order.

POST /_ingest/pipeline/mypipeline
{
  "pipeline": {
    "description": "Grok ingestion pipeline nginx logs",
    "version": 0,
    "processors": [
      {
        "grok": {
          "field": "message",
          "trace_match": true,
          "patterns": [
            """%{GREEDYDATA:irrelevant_data}Stock = %{GREEDYDATA:Stock}
%{GREEDYDATA:irrelevant_data}Date = %{GREEDYDATA:date}
%{GREEDYDATA:irrelevant_data}Volume=%{NUMBER:Volume}
%{GREEDYDATA:irrelevant_data}Low=%{BASE10NUM:Low}
%{GREEDYDATA:irrelevant_data}High=%{BASE10NUM:High}
%{GREEDYDATA:irrelevant_data}Open=%{BASE10NUM:Open}
%{GREEDYDATA:irrelevant_data}Close=%{BASE10NUM:Close}"""
          ]
        }
      },
      {
        "date": {
          "field": "date",
          "target_field": "busDate", 
          "formats": ["yyyy-MM-dd"]
        }
      }
    ]
  }
}

You need to add a mapping if you want busDate to be type date, do you know about mappings?

date mapping here


not sure how to add the mapping here ,

I noticed the mappings is along with index, but do not have the option to edit it.

@stephenb , Am I going in wrong direction to my initial requirement,
which is basically to strip the OpenHighLowClose and volume data in message from log and come up with visualization , what i have achieved is to get the data into individual fields for date and OHLC and volume seperately, but they are kind of string , not as double(OHLC), Volume(long) and date(Trading Date)

I would use your own index, template, mapping, pipeline etc.

Here is a template (it is legacy you can figure out the new components)

DELETE _ingest/pipeline/stock-ticker-pipeline

PUT /_ingest/pipeline/stock-ticker-pipeline
{
  "description": "Stock Ticker Pipeline",
  "version": 0,
  "processors": [
    {
      "grok": {
        "field": "message",
        "trace_match": true,
        "patterns": [
          """%{GREEDYDATA:irrelevant_data}Stock = %{GREEDYDATA:Stock}
%{GREEDYDATA:irrelevant_data}Date = %{GREEDYDATA:date}
%{GREEDYDATA:irrelevant_data}Volume=%{NUMBER:Volume}
%{GREEDYDATA:irrelevant_data}Low=%{BASE10NUM:Low}
%{GREEDYDATA:irrelevant_data}High=%{BASE10NUM:High}
%{GREEDYDATA:irrelevant_data}Open=%{BASE10NUM:Open}
%{GREEDYDATA:irrelevant_data}Close=%{BASE10NUM:Close}"""
        ]
      }
    },
    {
      "convert" : {
        "field" : "Volume",
        "type": "long"
      }
    },
    {
      "convert" : {
        "field" : "Low",
        "type": "double"
      }
    },
    {
      "convert" : {
        "field" : "High",
        "type": "double"
      }
    },
    {
      "convert" : {
        "field" : "Open",
        "type": "double"
      }
    },
    {
      "convert" : {
        "field" : "Close",
        "type": "double"
      }
    },
    {
      "date": {
        "field": "date",
        "target_field": "busDate",
        "formats": [
          "yyyy-MM-dd"
        ]
      }
    }
  ]
}

Here is a pipeline

DELETE _ingest/pipeline/stock-ticker-pipeline

PUT /_ingest/pipeline/stock-ticker-pipeline
{
  "description": "Stock Ticker Pipeline",
  "version": 0,
  "processors": [
    {
      "grok": {
        "field": "message",
        "trace_match": true,
        "patterns": [
          """%{GREEDYDATA:irrelevant_data}Stock = %{GREEDYDATA:Stock}
%{GREEDYDATA:irrelevant_data}Date = %{GREEDYDATA:date}
%{GREEDYDATA:irrelevant_data}Volume=%{NUMBER:Volume}
%{GREEDYDATA:irrelevant_data}Low=%{BASE10NUM:Low}
%{GREEDYDATA:irrelevant_data}High=%{BASE10NUM:High}
%{GREEDYDATA:irrelevant_data}Open=%{BASE10NUM:Open}
%{GREEDYDATA:irrelevant_data}Close=%{BASE10NUM:Close}"""
        ]
      }
    },
    {
      "convert" : {
        "field" : "Volume",
        "type": "long"
      }
    },
    {
      "convert" : {
        "field" : "Low",
        "type": "double"
      }
    },
    {
      "convert" : {
        "field" : "High",
        "type": "double"
      }
    },
    {
      "convert" : {
        "field" : "Open",
        "type": "double"
      }
    },
    {
      "convert" : {
        "field" : "Close",
        "type": "double"
      }
    },
    {
      "date": {
        "field": "date",
        "target_field": "busDate",
        "formats": [
          "yyyy-MM-dd"
        ]
      }
    }
  ]
}

Here is a sample

POST /stock-ticker-000001/_doc/?pipeline=stock-ticker-pipeline
{
  "message": """2021/06/27 15:17:49 :     INFO | Stock = 0700.HK
2021/06/27 15:17:49 :     INFO | Date = 2021-06-18
2021/06/27 15:17:49 :     INFO | Volume=12301606
2021/06/27 15:17:49 :     INFO | Low=599.5
2021/06/27 15:17:49 :     INFO | High=608.0
2021/06/27 15:17:49 :     INFO | Open=602.0
2021/06/27 15:17:49 :     INFO | Close=603.0"""
}

Here is the result it shows the source (what you put in) and the fields (what is actually used in visualizations, aggregations etc)

GET stock-ticker-000001/_search
{
  "fields": [
    "*"
  ]
}


{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "stock-ticker-000001",
        "_type" : "_doc",
        "_id" : "RZRO43oBx8ERHzwghUHq",
        "_score" : 1.0,
        "_source" : {
          "date" : "2021-06-18",
          "High" : 608.0,
          "message" : """2021/06/27 15:17:49 :     INFO | Stock = 0700.HK
2021/06/27 15:17:49 :     INFO | Date = 2021-06-18
2021/06/27 15:17:49 :     INFO | Volume=12301606
2021/06/27 15:17:49 :     INFO | Low=599.5
2021/06/27 15:17:49 :     INFO | High=608.0
2021/06/27 15:17:49 :     INFO | Open=602.0
2021/06/27 15:17:49 :     INFO | Close=603.0""",
          "irrelevant_data" : "2021/06/27 15:17:49 :     INFO | ",
          "Open" : 602.0,
          "busDate" : "2021-06-18T00:00:00.000Z",
          "Volume" : 12301606,
          "Low" : 599.5,
          "Close" : 603.0,
          "Stock" : "0700.HK"
        },
        "fields" : {
          "High" : [
            608.0
          ],
          "date" : [
            "2021-06-18T00:00:00.000Z"
          ],
          "busDate" : [
            "2021-06-18T00:00:00.000Z"
          ],
          "irrelevant_data.keyword" : [
            "2021/06/27 15:17:49 :     INFO | "
          ],
          "Low" : [
            599.5
          ],
          "Volume" : [
            12301606
          ],
          "Close" : [
            603.0
          ],
          "message" : [
            """2021/06/27 15:17:49 :     INFO | Stock = 0700.HK
2021/06/27 15:17:49 :     INFO | Date = 2021-06-18
2021/06/27 15:17:49 :     INFO | Volume=12301606
2021/06/27 15:17:49 :     INFO | Low=599.5
2021/06/27 15:17:49 :     INFO | High=608.0
2021/06/27 15:17:49 :     INFO | Open=602.0
2021/06/27 15:17:49 :     INFO | Close=603.0"""
          ],
          "Stock" : [
            "0700.HK"
          ],
          "irrelevant_data" : [
            "2021/06/27 15:17:49 :     INFO | "
          ],
          "Open" : [
            602.0
          ]
        }
      }
    ]
  }
}

If you look at the mapping everything looks good

GET stock-ticker-000001/

{
  "stock-ticker-000001" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "Close" : {
          "type" : "double"
        },
        "High" : {
          "type" : "double"
        },
        "Low" : {
          "type" : "double"
        },
        "Open" : {
          "type" : "double"
        },
        "Stock" : {
          "type" : "keyword",
          "ignore_above" : 256
        },
        "Volume" : {
          "type" : "long"
        },
        "busDate" : {
          "type" : "date"
        },
        "date" : {
          "type" : "date"
        },
        "irrelevant_data" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "message" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "stock-ticker-000001",
        "creation_date" : "1627311211856",
        "number_of_replicas" : "1",
        "uuid" : "oceKzXnrSMWF2KdY0bMXug",
        "version" : {
          "created" : "7130499"
        }
      }
    }
  }
}

Now in the filebeat.yml you can make the output
There are many ways to many indices and roll over etc.
This is just an example using daily indices, you may need something else in the long run but this may get you started

output.elasticsearch:
  hosts: ["localhost:9200"]
  pipeline: stock-ticker-pipeline
  index: "stock-ticker-%{+yyyy.MM.dd}"