What's the real difference between filter and codec and which I have to use?

Hello,

I didn't understand very well the difference between filter and codec and I'm in doubt about which one I have to use to solve my problem.

My logstash is listening a JMS queue where XML messages are received. This XML looks like the piece below:

<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<mon:monitorEvent mon:id=\"B291b836aa99a51792162189\"
    xmlns:bpmn=\"http://schema.omg.org/spec/BPMN/2.0\"
    xmlns:bpmnx=\"http://www.ibm.com/xmlns/bpmnx/20100524/BusinessMonitoring\"
    xmlns:mon=\"http://www.ibm.com/xmlns/prod/websphere/monitoring/7.5\"
    xmlns:ibm=\"http://www.ibm.com/xmlns/prod/websphere/monitoring/7.5/extensions\"
    xmlns:wle=\"http://www.ibm.com/xmlns/prod/websphere/lombardi/7.5\"
    xmlns:xs=\"http://www.w3.org/2001/XMLSchema\">
    <mon:eventPointData>
        <mon:kind mon:version=\"2010-11-11\">wle:PROCESS_AT_RISK_DATE_ASSIGNED</mon:kind>
        <mon:time mon:of=\"occurrence\">2017-03-04T11:12:45.064-03:00</mon:time>
        <ibm:sequenceId>0000000005</ibm:sequenceId>
        <mon:model mon:type=\"bpmn:process\" mon:id=\"eb87b79d-798f-4049-9a1a-1e42780ee748\" mon:version=\"2064.e9107612-144b-4faf-807a-8d991ed166e8\">
            <mon:name>Processo de Teste do DEF</mon:name>
            <mon:documentation></mon:documentation>
            <mon:instance mon:id=\"503\">
                <mon:state>Active</mon:state>
            </mon:instance>
        </mon:model>
        <mon:model mon:type=\"wle:processApplication\" mon:id=\"546450e0-22d0-458c-84dd-d6a6129b7654\" mon:version=\"2064.e9107612-144b-4faf-807a-8d991ed166e8\">
            <mon:name>Teste DEF</mon:name>
            <mon:documentation></mon:documentation>
        </mon:model>
        <mon:correlation>
            <mon:ancestor mon:id=\"eb87b79d-798f-4049-9a1a-1e42780ee748.2064.e9107612-144b-4faf-807a-8d991ed166e8.503\"></mon:ancestor>
            <wle:starting-process-instance>eb87b79d-798f-4049-9a1a-1e42780ee748.2064.e9107612-144b-4faf-807a-8d991ed166e8.503</wle:starting-process-instance>
        </mon:correlation>
        <mon:source>
            <ibm:system ibm:systemID=\"ccf3ad10-3d2d-4333-831b-f9c47e32209e\"/>
        </mon:source>
    </mon:eventPointData>
</mon:monitorEvent>

I would like to parse this xml file, getting the content of some tags to create the json and insert to elasticsearch.
Should I use a codec or filter for this?
Is there a codec/filter/something_else that does what I need or should I write my own xml parser to my xml format?

I really appreciate any help.
Thanks

Should I use a codec or filter for this?

While it would make some sense to have a codec for processing XML there is no such thing. There is however an xml filter.

Is there a codec/filter/something_else that does what I need or should I write my own xml parser to my xml format?

Use the xml filter. Its xpath option should be useful to you.

Thanks for the reply and with your suggestions I think I can solve the problem but I still would like to understand when I should use a codec or a filter. Can you explain this?

Thanks

In most cases what you want to do is only possible via a codec or a filter, so you're rarely confronted with that choice. That's the most general answer there is. The only exception I can think of is JSON which can be deserialized via either a codec or a filter.

As I answered in another thread yesterday, probably everything that can be done with a json codec can be done with a json filter but the opposite isn't true.

Just to finish...
From the docs:

Codec
A codec plugin changes the data representation of an event. Codecs are essentially stream filters that can operate as part of an input or output.

Filter
A filter plugin performs intermediary processing on an event. Filters are often applied conditionally depending on the characteristics of the event.

So, to solve my problem, a better solution would be:

  1. Logstash is listening my JMS queue a message arrives.
  2. As the message is on XML format, a codec could convert it to JSON format. (I know that this codec doesn't exist yet).
  3. A filter, based on some tags of the json message could do computing, data enrichment, delete/add parts of the json.
  4. As output, I could save this json on my elasticsearch, so I can use kibana to create my dashboards and reports.

Is this flow right?

Thanks again for spend your time with a newbie :slight_smile:

Yes, that makes sense.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.