What's the real difference between filter and codec and which I have to use?


I didn't understand very well the difference between filter and codec and I'm in doubt about which one I have to use to solve my problem.

My logstash is listening a JMS queue where XML messages are received. This XML looks like the piece below:

<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<mon:monitorEvent mon:id=\"B291b836aa99a51792162189\"
        <mon:kind mon:version=\"2010-11-11\">wle:PROCESS_AT_RISK_DATE_ASSIGNED</mon:kind>
        <mon:time mon:of=\"occurrence\">2017-03-04T11:12:45.064-03:00</mon:time>
        <mon:model mon:type=\"bpmn:process\" mon:id=\"eb87b79d-798f-4049-9a1a-1e42780ee748\" mon:version=\"2064.e9107612-144b-4faf-807a-8d991ed166e8\">
            <mon:name>Processo de Teste do DEF</mon:name>
            <mon:instance mon:id=\"503\">
        <mon:model mon:type=\"wle:processApplication\" mon:id=\"546450e0-22d0-458c-84dd-d6a6129b7654\" mon:version=\"2064.e9107612-144b-4faf-807a-8d991ed166e8\">
            <mon:name>Teste DEF</mon:name>
            <mon:ancestor mon:id=\"eb87b79d-798f-4049-9a1a-1e42780ee748.2064.e9107612-144b-4faf-807a-8d991ed166e8.503\"></mon:ancestor>
            <ibm:system ibm:systemID=\"ccf3ad10-3d2d-4333-831b-f9c47e32209e\"/>

I would like to parse this xml file, getting the content of some tags to create the json and insert to elasticsearch.
Should I use a codec or filter for this?
Is there a codec/filter/something_else that does what I need or should I write my own xml parser to my xml format?

I really appreciate any help.

Should I use a codec or filter for this?

While it would make some sense to have a codec for processing XML there is no such thing. There is however an xml filter.

Is there a codec/filter/something_else that does what I need or should I write my own xml parser to my xml format?

Use the xml filter. Its xpath option should be useful to you.

Thanks for the reply and with your suggestions I think I can solve the problem but I still would like to understand when I should use a codec or a filter. Can you explain this?


In most cases what you want to do is only possible via a codec or a filter, so you're rarely confronted with that choice. That's the most general answer there is. The only exception I can think of is JSON which can be deserialized via either a codec or a filter.

As I answered in another thread yesterday, probably everything that can be done with a json codec can be done with a json filter but the opposite isn't true.

Just to finish...
From the docs:

A codec plugin changes the data representation of an event. Codecs are essentially stream filters that can operate as part of an input or output.

A filter plugin performs intermediary processing on an event. Filters are often applied conditionally depending on the characteristics of the event.

So, to solve my problem, a better solution would be:

  1. Logstash is listening my JMS queue a message arrives.
  2. As the message is on XML format, a codec could convert it to JSON format. (I know that this codec doesn't exist yet).
  3. A filter, based on some tags of the json message could do computing, data enrichment, delete/add parts of the json.
  4. As output, I could save this json on my elasticsearch, so I can use kibana to create my dashboards and reports.

Is this flow right?

Thanks again for spend your time with a newbie :slight_smile:

Yes, that makes sense.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.