Create new Filebeat

Hi Elastic community

I'm trying to build a custom Filebeat module but to me the Doc is a little unspecific:
https://www.elastic.co/guide/en/beats/devguide/current/filebeat-modules-devguide.html

I've come as far as creating the module, the fileset and writing out a log/ingest/pipeline.json

However when I try to run make create-fields I keep hitting stumbling blocks:
i.e 

mage generate:fields
Error: cannot read pipeline: invalid character ']' looking for beginning of value
make: *** [create-fields] Error 1

Any further guidance would be greatly appreciated.

Hey @saffatechy,

This error seems to indicate some syntax error in some of your files. Can you check with a yaml validator if your JSON and YAML files are correct?

Hi Jaime

I've tried ensuring that the syntax is correct, but it now gives me this error:

mage generate:fields
Error: cannot read pipeline: invalid character ']' looking for beginning of value
make: *** [create-fields] Error 1

See below:


module/testmodule/log/ingest/pipeline.json content: 

{
  "description": "Pipeline for parsing testmodule log logs",
  "processors": [ {
        "grok": {
          "field": "message",
          "patterns":
["%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} (?<thread>(%{IP}:%{INT})?(-)?(Thread-%{INT})?(\\(%{INT}\\)-)?%{WORD}(:%{WORD}:%{INT}:%{INT})?(-%{INT})?) \\[%{WORD:category}\\] (?<bizlog>(.|\r|\n)*)"],
          "ignore_missing":  true
          }
        },
        {
          "remove": {
            "field": "message"
          }
        },
    ],
  "on_failure" : [{
    "set" : {
      "field" : "error.message",
      "value" : "{{ _ingest.on_failure_message }}"
    }
  }]
}

There is an extra comma in the remove field processor, to be valid json, it should be:

        },
        {
          "remove": {
            "field": "message"
          }
        }
    ],

Take into account that processors can also be defined in yaml format, what can be easier to maintain.

Hi Jaime

You were right.

Now I'm hitting a new stumbling block, when running:

GENERATE=1 INTEGRATION_TESTS=1 BEAT_STRICT_PERMS=false TESTING_FILEBEAT_MODULES=testmodule ./pytest tests/system/test_modules.py

NOTE:

I have to break this up in to 2 message blocks:


[saffatechy@bizuat filebeat]$ GENERATE=1 INTEGRATION_TESTS=1 BEAT_STRICT_PERMS=false TESTING_FILEBEAT_MODULES=testmodule ./pytest tests/system/test_modules.py
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /home/saffatechy/projects/go/src/github.com/elastic/beats, configfile: pytest.ini
plugins: timeout-1.3.4
collected 1 item                                                                                                                                                                                           

tests/system/test_modules.py F                                                                                                                                                                       [100%]

================================================================================================= FAILURES =================================================================================================
____________________________________________________________________________________ Test.test_fileset_file_0_testmodule ____________________________________________________________________________________

a = (<test_modules.Test testMethod=test_fileset_file_0_testmodule>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

build/python-env/lib/python3.7/site-packages/parameterized/parameterized.py:518: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/system/test_modules.py:99: in test_fileset_file
    cfgfile=cfgfile)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test_modules.Test testMethod=test_fileset_file_0_testmodule>, module = 'testmodule', fileset = 'log'
test_file = '/home/saffatechy/projects/go/src/github.com/elastic/beats/filebeat/tests/system/../../module/testmodule/log/test/testmodule.log'
cfgfile = '/home/saffatechy/projects/go/src/github.com/elastic/beats/filebeat/build/system-tests/run/test_modules.Test.test_fileset_file_0_testmodule/filebeat.yml'

    def run_on_file(self, module, fileset, test_file, cfgfile):
        print("Testing {}/{} on {}".format(module, fileset, test_file))
    
        self.assert_explicit_ecs_version_set(module, fileset)
    
        try:
            self.es.indices.delete(index=self.index_name)
        except:
            pass
        self.wait_until(lambda: not self.es.indices.exists(self.index_name))
    
        cmd = [
            self.filebeat, "-systemTest",
            "-e", "-d", "*", "-once",
            "-c", cfgfile,
            "-E", "setup.ilm.enabled=false",
            "-modules={}".format(module),
            "-M", "{module}.*.enabled=false".format(module=module),
            "-M", "{module}.{fileset}.enabled=true".format(
                module=module, fileset=fileset),
            "-M", "{module}.{fileset}.var.input=file".format(
                module=module, fileset=fileset),
            "-M", "{module}.{fileset}.var.paths=[{test_file}]".format(
                module=module, fileset=fileset, test_file=test_file),
            "-M", "*.*.input.close_eof=true",
        ]
    
        # Based on the convention that if a name contains -json the json format is needed. Currently used for LS.
        if "-json" in test_file:
            cmd.append("-M")
            cmd.append("{module}.{fileset}.var.format=json".format(module=module, fileset=fileset))
    
        output_path = os.path.join(self.working_dir)
        output = open(os.path.join(output_path, "output.log"), "ab")
        output.write(bytes(" ".join(cmd) + "\n", "utf-8"))
    
        # Use a fixed timezone so results don't vary depending on the environment
        # Don't use UTC to avoid hiding that non-UTC timezones are not being converted as needed,
        # this can happen because UTC uses to be the default timezone in date parsers when no other
        # timezone is specified.
        local_env = os.environ.copy()
        local_env["TZ"] = 'Etc/GMT+2'
    
        subprocess.Popen(cmd,
                         env=local_env,
                         stdin=None,
                         stdout=output,
                         stderr=subprocess.STDOUT,
                         bufsize=0).wait()
    
        # Make sure index exists
        self.wait_until(lambda: self.es.indices.exists(self.index_name))
    
        self.es.indices.refresh(index=self.index_name)
        # Loads the first 100 events to be checked
        res = self.es.search(index=self.index_name,
                             body={"query": {"match_all": {}}, "size": 100, "sort": {"log.offset": {"order": "asc"}}})
        objects = [o["_source"] for o in res["hits"]["hits"]]
        assert len(objects) > 0
        for obj in objects:
            assert obj["event"]["module"] == module, "expected event.module={} but got {}".format(
                module, obj["event"]["module"])
    
            # All modules must include a set processor that adds the time that
            # the event was ingested to Elasticsearch
>           assert "ingested" in obj["event"], "missing event.ingested timestamp"
E           AssertionError: missing event.ingested timestamp
E           assert 'ingested' in {'dataset': 'testmodule.log', 'module': 'testmodule', 'timezone': '-02:00'}



tests/system/test_modules.py:166: AssertionError
------------------------------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------------------------------
Using elasticsearch: http://localhost:9200
Testing testmodule/log on /home/saffatechy/projects/go/src/github.com/elastic/beats/filebeat/tests/system/../../module/testmodule/log/test/testmodule.log
========================================================================================= short test summary info ==========================================================================================
FAILED tests/system/test_modules.py::Test::test_fileset_file_0_testmodule - AssertionError: missing event.ingested timestamp
====================================================================================== 1 failed in 179.40s (0:02:59) =======================================================================================
[saffatechy@bizuat filebeat]$ GENERATE=1 INTEGRATION_TESTS=1 BEAT_STRICT_PERMS=false TESTING_FILEBEAT_MODULES=testmodule ./pytest tests/system/test_modules.py
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /home/saffatechy/projects/go/src/github.com/elastic/beats, configfile: pytest.ini
plugins: timeout-1.3.4
collected 1 item                                                                                                                                                                                           

tests/system/test_modules.py F                                                                                                                                                                       [100%]

================================================================================================= FAILURES =================================================================================================
____________________________________________________________________________________ Test.test_fileset_file_0_testmodule ____________________________________________________________________________________

a = (<test_modules.Test testMethod=test_fileset_file_0_testmodule>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

build/python-env/lib/python3.7/site-packages/parameterized/parameterized.py:518: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/system/test_modules.py:99: in test_fileset_file
    cfgfile=cfgfile)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test_modules.Test testMethod=test_fileset_file_0_testmodule>, module = 'testmodule', fileset = 'log'
test_file = '/home/saffatechy/projects/go/src/github.com/elastic/beats/filebeat/tests/system/../../module/testmodule/log/test/testmodule.log'
cfgfile = '/home/saffatechy/projects/go/src/github.com/elastic/beats/filebeat/build/system-tests/run/test_modules.Test.test_fileset_file_0_testmodule/filebeat.yml'

    def run_on_file(self, module, fileset, test_file, cfgfile):
        print("Testing {}/{} on {}".format(module, fileset, test_file))
    
        self.assert_explicit_ecs_version_set(module, fileset)
    
        try:
            self.es.indices.delete(index=self.index_name)
        except:
            pass
        self.wait_until(lambda: not self.es.indices.exists(self.index_name))
    
        cmd = [
            self.filebeat, "-systemTest",
            "-e", "-d", "*", "-once",
            "-c", cfgfile,
            "-E", "setup.ilm.enabled=false",
            "-modules={}".format(module),
            "-M", "{module}.*.enabled=false".format(module=module),
            "-M", "{module}.{fileset}.enabled=true".format(
                module=module, fileset=fileset),
            "-M", "{module}.{fileset}.var.input=file".format(
                module=module, fileset=fileset),
            "-M", "{module}.{fileset}.var.paths=[{test_file}]".format(
                module=module, fileset=fileset, test_file=test_file),
            "-M", "*.*.input.close_eof=true",
        ]
    
        # Based on the convention that if a name contains -json the json format is needed. Currently used for LS.
        if "-json" in test_file:
            cmd.append("-M")
            cmd.append("{module}.{fileset}.var.format=json".format(module=module, fileset=fileset))
    
        output_path = os.path.join(self.working_dir)
        output = open(os.path.join(output_path, "output.log"), "ab")
        output.write(bytes(" ".join(cmd) + "\n", "utf-8"))
    
        # Use a fixed timezone so results don't vary depending on the environment
        # Don't use UTC to avoid hiding that non-UTC timezones are not being converted as needed,
        # this can happen because UTC uses to be the default timezone in date parsers when no other
        # timezone is specified.
        local_env = os.environ.copy()
        local_env["TZ"] = 'Etc/GMT+2'
    
        subprocess.Popen(cmd,
                         env=local_env,
                         stdin=None,
                         stdout=output,
                         stderr=subprocess.STDOUT,
                         bufsize=0).wait()
    
        # Make sure index exists
        self.wait_until(lambda: self.es.indices.exists(self.index_name))
    
        self.es.indices.refresh(index=self.index_name)
        # Loads the first 100 events to be checked
        res = self.es.search(index=self.index_name,
                             body={"query": {"match_all": {}}, "size": 100, "sort": {"log.offset": {"order": "asc"}}})
        objects = [o["_source"] for o in res["hits"]["hits"]]
        assert len(objects) > 0
        for obj in objects:
            assert obj["event"]["module"] == module, "expected event.module={} but got {}".format(
                module, obj["event"]["module"])
    
            # All modules must include a set processor that adds the time that
            # the event was ingested to Elasticsearch
>           assert "ingested" in obj["event"], "missing event.ingested timestamp"
E           AssertionError: missing event.ingested timestamp
E           assert 'ingested' in {'dataset': 'testmodule.log', 'module': 'testmodule', 'timezone': '-02:00'}

tests/system/test_modules.py:166: AssertionError
------------------------------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------------------------------
Using elasticsearch: http://localhost:9200
Testing testmodule/log on /home/saffatechy/projects/go/src/github.com/elastic/beats/filebeat/tests/system/../../module/testmodule/log/test/testmodule.log
========================================================================================= short test summary info ==========================================================================================
FAILED tests/system/test_modules.py::Test::test_fileset_file_0_testmodule - AssertionError: missing event.ingested timestamp
====================================================================================== 1 failed in 191.10s (0:03:11) =======================================================================================

Hey @saffatechy,

Look at the comment near the failing assertions:

This is something recent. You need to add an event.ingested field to all ingested documents.

You can see how this is done for example in the apache ingest pipeline: https://github.com/elastic/beats/blob/0ea1def3c688da2cfbba8571617ed3460f6cba9c/filebeat/module/apache/access/ingest/pipeline.yml#L4

More info here: Add event.ingested to all Filebeat modules by andrewkroh · Pull Request #20386 · elastic/beats · GitHub

Hi Jaime

That worked.

[saffatechy@bizuat filebeat]$ GENERATE=1 INTEGRATION_TESTS=1 BEAT_STRICT_PERMS=false TESTING_FILEBEAT_MODULES=testmodule ./pytest tests/system/test_modules.py
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /home/saffatechy/projects/go/src/github.com/elastic/beats, configfile: pytest.ini
plugins: timeout-1.3.4
collected 1 item                                                                                                                                                                                           

tests/system/test_modules.py F                                                                                                                                                                       [100%]

================================================================================================= FAILURES =================================================================================================
____________________________________________________________________________________ Test.test_fileset_file_0_testmodule ____________________________________________________________________________________

a = (<test_modules.Test testMethod=test_fileset_file_0_testmodule>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

build/python-env/lib/python3.7/site-packages/parameterized/parameterized.py:518: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/system/test_modules.py:99: in test_fileset_file
    cfgfile=cfgfile)
tests/system/test_modules.py:176: in run_on_file
    self.assert_fields_are_documented(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test_modules.Test testMethod=test_fileset_file_0_testmodule>
evt = {'@timestamp': '2020-09-14T13:27:14.443Z', 'agent': {'ephemeral_id': 'ed37b947-6c57-4735-bc30-c091d6a0413a', 'id': 'f5...rs.bizxml_Indeejsontranslator.bizxml_IndeeJsonTranslator$BalanceCheck", 'category': 'PropertySettingJobFactory', ...}

    def assert_fields_are_documented(self, evt):
        """
        Assert that all keys present in evt are documented in fields.yml.
        This reads from the global fields.yml, means `make collect` has to be run before the check.
        """
        expected_fields, dict_fields, aliases = self.load_fields()
        flat = self.flatten_object(evt, dict_fields)
    
        def field_pattern_match(pattern, key):
            pattern_fields = pattern.split(".")
            key_fields = key.split(".")
            if len(pattern_fields) != len(key_fields):
                return False
            for i in range(len(pattern_fields)):
                if pattern_fields[i] == "*":
                    continue
                if pattern_fields[i] != key_fields[i]:
                    return False
            return True
    
        def is_documented(key, docs):
            if key in docs:
                return True
            for pattern in (f for f in docs if "*" in f):
                if field_pattern_match(pattern, key):
                    return True
            return False
    
        for key in flat.keys():
            metaKey = key.startswith('@metadata.')
            # Range keys as used in 'date_range' etc will not have docs of course
            isRangeKey = key.split('.')[-1] in ['gte', 'gt', 'lte', 'lt']
            if not(is_documented(key, expected_fields) or metaKey or isRangeKey):
>               raise Exception("Key '{}' found in event is not documented!".format(key))
E               Exception: Key 'bizlog' found in event is not documented!

../libbeat/tests/system/beat/beat.py:729: Exception
------------------------------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------------------------------
Using elasticsearch: http://localhost:9200
Testing testmodule/log on /home/saffatechy/projects/go/src/github.com/elastic/beats/filebeat/tests/system/../../module/testmodule/log/test/testmodule.log
========================================================================================= short test summary info ==========================================================================================
FAILED tests/system/test_modules.py::Test::test_fileset_file_0_testmodule - Exception: Key 'bizlog' found in event is not documented!
====================================================================================== 1 failed in 180.30s (0:03:00) ============================================================

Then in addition:

Can you make any pipeline processor recommendation for multi line:

2020-08-20 07:21:32,156 DEBUG 8.8.8.8:19539-Thread-215(215)-BIZXML:CLIENT:00001:2840-023307225383 [ipay_log] SQL Statement took 3ms to execute. SQL:
INSERT INTO elec_trans (vend_ref_received, vend_req_received, client, terminal, amt_requested, cur, num_tokens, meter, pay_type, pos_ref, use_advice, vend_ref_sent, vend_req_sent, service, is_test) 
VALUES('023307225383', '2020-08-20 07:22:22.0', 'CLIENT', '00001', 1500.0, 'MWK', 1, '47000259276', 'cash', 'null', 0, '202330721321505866', '2020-08-20 07:21:32.153', 'Lwblog', 0)
2020-08-20 07:21:32,156 DEBUG 8.8.8.8:19539-Thread-215(215)-BIZXML:CLIENT:00001:2840-023307225383 [ipay_log] Con

Hey @saffatechy,

Good to see that you could make some progress :slightly_smiling_face:

For the multiline, you can add some configuration for that. Take a look for example to the kafka log fileset: https://github.com/elastic/beats/blob/c9f7a99e1736853955c07230c78649867d4e8c40/filebeat/module/kafka/log/config/log.yml#L7

Hi Jaime

It's still failing though, not quite sure why it's complaining that bizlog is not documented.

[saffatechy@bizuat filebeat]$ GENERATE=1 INTEGRATION_TESTS=1 BEAT_STRICT_PERMS=false TESTING_FILEBEAT_MODULES=testmodule ./pytest tests/system/test_modules.py
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /home/saffatechy/projects/go/src/github.com/elastic/beats, configfile: pytest.ini
plugins: timeout-1.3.4
collected 1 item                                                                                                                                                                                           

tests/system/test_modules.py F                                                                                                                                                                       [100%]

================================================================================================= FAILURES =================================================================================================
____________________________________________________________________________________ Test.test_fileset_file_0_testmodule ____________________________________________________________________________________

a = (<test_modules.Test testMethod=test_fileset_file_0_testmodule>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

build/python-env/lib/python3.7/site-packages/parameterized/parameterized.py:518: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/system/test_modules.py:99: in test_fileset_file
    cfgfile=cfgfile)
tests/system/test_modules.py:176: in run_on_file
    self.assert_fields_are_documented(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test_modules.Test testMethod=test_fileset_file_0_testmodule>
evt = {'@timestamp': '2020-09-14T13:27:14.443Z', 'agent': {'ephemeral_id': 'ed37b947-6c57-4735-bc30-c091d6a0413a', 'id': 'f5...rs.bizxml_Indeejsontranslator.bizxml_IndeeJsonTranslator$BalanceCheck", 'category': 'PropertySettingJobFactory', ...}

    def assert_fields_are_documented(self, evt):
        """
        Assert that all keys present in evt are documented in fields.yml.
        This reads from the global fields.yml, means `make collect` has to be run before the check.
        """
        expected_fields, dict_fields, aliases = self.load_fields()
        flat = self.flatten_object(evt, dict_fields)
    
        def field_pattern_match(pattern, key):
            pattern_fields = pattern.split(".")
            key_fields = key.split(".")
            if len(pattern_fields) != len(key_fields):
                return False
            for i in range(len(pattern_fields)):
                if pattern_fields[i] == "*":
                    continue
                if pattern_fields[i] != key_fields[i]:
                    return False
            return True
    
        def is_documented(key, docs):
            if key in docs:
                return True
            for pattern in (f for f in docs if "*" in f):
                if field_pattern_match(pattern, key):
                    return True
            return False
    
        for key in flat.keys():
            metaKey = key.startswith('@metadata.')
            # Range keys as used in 'date_range' etc will not have docs of course
            isRangeKey = key.split('.')[-1] in ['gte', 'gt', 'lte', 'lt']
            if not(is_documented(key, expected_fields) or metaKey or isRangeKey):
>               raise Exception("Key '{}' found in event is not documented!".format(key))
E               Exception: Key 'bizlog' found in event is not documented!

../libbeat/tests/system/beat/beat.py:729: Exception
------------------------------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------------------------------
Using elasticsearch: http://localhost:9200
Testing testmodule/log on /home/saffatechy/projects/go/src/github.com/elastic/beats/filebeat/tests/system/../../module/testmodule/log/test/testmodule.log
========================================================================================= short test summary info ==========================================================================================
FAILED tests/system/test_modules.py::Test::test_fileset_file_0_testmodule - Exception: Key 'bizlog' found in event is not documented!
====================================================================================== 1 failed in 180.30s (0:03:00) ============================================================

I'll try to figure out the multiline stuff.

Thanks for the pointers.

Oh sorry, I didn't see this error...

This uses to mean that the module is creating events with fields that are not documented. When adding a new field, you need to add it to a fields.yml file. Then before running tests again remember to run mage fields.

Hi Jaime

The odd thing is, it is documented (left at defaults, this I know), see below:

- name: timestamp
  description: Please add description
  example: Please add example
- name: loglevel
  description: Please add description
  example: Please add example
  type: keyword
- name: thread
  description: Please add description
  example: Please add example
  type: keyword
- name: category
  description: Please add description
  example: Please add example
  type: keyword
- name: bizlog
  description: Please add description
  example: Please add example
  type: text

Each time I make changes to my pipeline I have run:

make create-fields MODULE=testmodule FILESET=log

Followed by:

make update

resulting in:

drwxr-x---  2 saffatechy saffatechy      4096 Sep 16 09:24 data
-rw-rw-r--  1 saffatechy saffatechy       319 Sep 16 09:17 filebeat.docker.yml
-rw-rw-r--  1 saffatechy saffatechy     79559 Sep 16 09:17 filebeat.reference.yml
-rw-rw-r--  1 saffatechy saffatechy      8912 Sep 16 09:17 filebeat.yml
drwxr-xr-x  2 saffatechy saffatechy      4096 Sep 16 09:17 modules.d
drwxr-xr-x  7 saffatechy saffatechy      4096 Sep 16 09:17 build
-rw-r--r--  1 saffatechy saffatechy    262839 Sep 16 09:17 fields.yml

Any other idea?

Do you have a public branch with this code where I can take a look?

I don't currently, how can I create one and make it available to you?

Do you have a beats fork in https://github.com?

Hi Jaime

See below:

Module called bizswitch (sorry for the previous generic alias)

Hey @saffatechy,

I have taken a look to your module and it needs some changes:

  • thread and category need to be prefixed with bizswitch.
  • timestamp should be parsed and stored in @timestamp.
  • Log message should be stored in message, or at least in a known field.
  • loglevel should be stored in log.level (or in some known field).

Find here a patch with some changes that work for me:

diff --git a/filebeat/module/bizswitch/log/ingest/pipeline.json b/filebeat/module/bizswitch/log/ingest/pipeline.json
index 43bf104344..d9b674433b 100644
--- a/filebeat/module/bizswitch/log/ingest/pipeline.json
+++ b/filebeat/module/bizswitch/log/ingest/pipeline.json
@@ -8,13 +8,21 @@
         "grok": {
           "field": "message",
           "patterns":
-["%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} (%{SPACE})?(?<thread>(%{IP}[:]%{INT})?(-)?(Thread-%{INT})?(\\(%{INT}\\)-)?%{WORD}(:%{WORD})?([\\-])?(%{SPACE})?(([a-zA-Z]+)?(%{SPACE})?(%{WORD})?([:])?)(%{INT})?([:])?(%{INT})?([\\-])?(%{INT})?) \\[%{WORD:category}\\] %{GREEDYDATA:bizlog}"],
+["%{TIMESTAMP_ISO8601:_tmp.timestamp} %{LOGLEVEL:log.level} (%{SPACE})?(?<bizswitch.thread>(%{IP}[:]%{INT})?(-)?(Thread-%{INT})?(\\(%{INT}\\)-)?%{WORD}(:%{WORD})?([\\-])?(%{SPACE})?(([a-zA-Z]+)?(%{SPACE})?(%{WORD})?([:])?)(%{INT})?([:])?(%{INT})?([\\-])?(%{INT})?) \\[%{WORD:bizswitch.category}\\] %{GREEDYDATA:message}"],
           "ignore_missing":  true
           }
         },
+       {
+         "date": {
+            "field": "_tmp.timestamp",
+            "target_field": "@timestamp",
+            "formats": ["yyyy-MM-dd HH:mm:ss,SSS"],
+            "timezone": "{{ event.timezone }}"
+          }
+       },
         {
           "remove": {
-            "field": "message"
+            "field": "_tmp"
           }
         }
     ],

Hi @jsoriano

Why the need for the bizswitch prefixes?
If we are storing the full log message, does that not mean that we are duplicating data?
I see that you have changed the field name bizlog to message?
When you say known field? Do you mean to say that the module creator cannot create arbitrary fields?

In general it is very recommended to define index mappings for your fields, these mappings explicitly set a type for your fields, what helps on indexing and querying the data. If this mapping is not created, Elasticsearch chooses automatically one for you, what sometimes works well, but sometimes not so much. For example if you store a numeric zero 0 in a field without mapping, Elasticsearch may decide to store it as an integer, but if later you want to store floats or strings in this field, you will find problems, as they cannot be mapped to integers.
So in general it is better to define a mapping before-hand. This is what you do with the fields.yml fields.
Filebeat module tests require all fields to be documented to avoid mapping problems on fields of provided modules.

They are fields specific of bizswitch, so they should use some name that doesn't collide with fields that other modules could use.
If you are very sure that your fields, stored at the root level without any prefix, are not going to collide with anything, you can continue using them without a prefix, but you should still define a mapping.

Yes, message is the recommended field to store log messages. The message field already has a mapping included in Filebeat, it is documented in ECS, and it is supported by default by the Kibana Logs app.
You can though use any other field, but again, it'd be recommended to add a mapping for it.

With known field I mean that the field is documented in a fields.yml file.
Arbitrary fields can be used, but it is recommended to add them to a fields.yml file, so they are better supported (they are documented, they have a mapping, they are checked in tests).
You may need to manually edit the fields.yml fields for your needings.