Can not decode Citrix Netscaler IPFIX messages

tkuronen · August 12, 2020, 4:53am

Is anyone here successfully decoded a IPFIX data send by Citrix Netscaler ?

Version: Logstash (7.7.1) / logstash-codec-netflow (4.2.1)
Operating System: RHEL 7.8

Logstash Input config:

input {
    udp {
        id => "logstash_netscaler_input"
        port => 12208
        codec => netflow {
             versions => [10]
             target => ipfix
          cache_save_path => "/tmp"
          ipfix_definitions => "/etc/logstash/conf.d/ipfix.yaml"
          include_flowset_id => true
        }
        type => ipfix
     }
}

File ipfix.yaml is exactly this one:

github.com

logstash-plugins/logstash-codec-netflow/blob/master/lib/logstash/codecs/netflow/ipfix.yaml

---
0:
  0:
  - :skip
  1:
  - :uint64
  - :octetDeltaCount
  2:
  - :uint64
  - :packetDeltaCount
  3:
  - :uint64
  - :deltaFlowCount
  4:
  - :uint8
  - :protocolIdentifier
  5:
  - :uint8
  - :ipClassOfService
  6:

This file has been truncated. show original

Traffic is received from Netscaler to Logstash but it seems that the IPFIX template can not be read. We get the following Warnings to logstash-plain.log from id's 256 - 332:
Can't (yet) decode flowset id 256 from observation domain id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.

... and this Warning from enterprise field:
Unsupported enterprise field {:type=>638, :enterprise=>5951, :length=>2}

However it can be confirmed by taking a tcpdump capture from Logstash server that templates are received.

When starting Logstash, it warns that:
Template Cache does not exist {:file_path=>"/tmp/ipfix_templates.cache"}
However the cache file is not generated even if pipeline is running for hours, although templates are sent by Netscaler every 60 seconds.

cknz · August 12, 2020, 11:28am

I have done something similar in the past for the AppFlow stuff that comes out of the Citrix Netscaler platform; before there was a plugin for IPFIX. Those flowset IDs look very familiar.

The conclusion I came away with was that the Netscalers don't actually send all of the templates (that may not be true in later versions; I've found AppFlow logging to be unsatisfactory, so haven't persued it since for much).

When I approached the problem, I basically captured / reverse-engineered some of templates myself. I wrote a problem that used a forked version of libipfix (see this HISTORICAL archive at https://github.com/cameronkerrnz/libipfix -- it does have some parsing issues currently) that emitted JSON to a file, which was then tailed by logstash.

Hope that helps.

PS. If /tmp/ipfix_templates.cache is not being written to, things to consider:

permissions, including SELinux
SystemD private temporary directory may be in enabled
what does strace say about it?

tkuronen · August 13, 2020, 7:57am

Thanks for the tips.

I tried to also change the cache_save_path to /var/lib/logstash/cache/ and added an empty json {} to file ipfix_templates.cache
When Logstash is starting, if I understand correctly it reads the json and checks that file is writable:

8532  open("/var/lib/logstash/cache/ipfix_templates.cache", O_RDONLY|O_CLOEXEC) = 70
8532  fcntl(70, F_GETFD)                = 0x1 (flags FD_CLOEXEC)
8532  fcntl(70, F_SETFD, FD_CLOEXEC)    = 0
8532  fstat(70, {st_mode=S_IFREG|0644, st_size=3, ...}) = 0
8532  fcntl(70, F_GETFD)                = 0x1 (flags FD_CLOEXEC)
8532  ioctl(70, TCGETS, 0x7effeabf8640) = -1 ENOTTY (Inappropriate ioctl for device)
8532  fstat(70, {st_mode=S_IFREG|0644, st_size=3, ...}) = 0
8532  lseek(70, 0, SEEK_CUR)            = 0
8532  read(70, "{}\n", 3)               = 3
8532  read(70, "", 1024)                = 0
8532  close(70)                         = 0
8532  access("/var/lib/logstash/cache/ipfix_templates.cache", W_OK) = 0

However after that, it does nothing to the cache files. However taking tcpdump and inspecting it with Wireshark, it can be confirmed that template packages are received.

Badger · August 13, 2020, 2:47pm

As far as I can see persist() is never called, so the cache will never be written to disk.

persist() was defined, presumably to replace the old call to save_templates_cache. But the old call was just removed and not replaced with a call to persist. Perhaps @yaauie will comment.

(That might explain how often folks complain about the 'no template to decode it with has been received' message.)

cknz · August 15, 2020, 8:23am

I think Badger is correct. But there is also the memory-based cache to consider, and if there is a warning that there is no-such template known, that would indicate that the template never got entered into the memory cache, so there would appear to be two problems.

Thinking the persistance code-path through a bit:

the templates get written to disk by do_persist
do_persist is called from both persist and do_register
do_register is not called from LogStash::Codecs::Netflow#register (L61) but is called from TemplateRegistry#register (L567)
register docstring indicates that that if invalid_template is thrown it won't be cached
register is called from the the template processing parts of decode_ipfix (L306) and decode_netflow9 (L203)
persist itself seems never to be called, but that shouldn't matter; persistence should still occur when a new template is received.

In decode_ipfix, it would seem (I'm not a Ruby developer yet), that ipfix_field_for (L299) must succeed in order to reach the call to @ipfix_template.register. It is in ipfix_field_for that we see the warning mentioned:

Unsupported enterprise field {:type=>638, :enterprise=>5951, :length=>2}

Enterprise number 5951 is indeed Citrix. The code would suggest that it will only take enterprises-type pairs that it knows about in @ipfix_fields (note: now we're talking about fields, not templates). The fact that the warning is about an unsupported field, rather than an unsupported enterprise helps to narrow the focus..... I think 638 is a type that came later in the development of the Citrix Netscaler (Application Delivery Controller) appliance.

So where are these fields defined? L73 tells me in netflow/ipfix.yaml, which has been provided. But you'll see that the section of that file that pertains to Citrix (5951) is L1260-L2157, and the newer types are not included.

So if you're not seeing any persistance, it would suggest that the only templates you have configured to be sent are for those that come after 541.

What needs to happen is for a complete set of those types to be dug up. That's the problem with AppFlow, as this mapping is not published (or at least, was not published in an up-to-date form when I last looked for it some years ago).

You might wonder why we need this mappings at all, isn't IPFIX meant to be flexible? Well, yes and no. The template records that get sent out are a bit like structure definitions, but they only tell you that a particular message is formulated from a particular set of fields (Information Elements, or IEs for short, or those specific to each Enterprise, or EIEs), not what those fields are (how big they are, what sort of datatype it contains or what you might call them).

Thus, the receiver needs to know the following:

how to interpret each IE (all of them within each fieldset it receives)
what to do with each IE (eg. what name to give it, or what it means semantically)

You can gain some partial appreciation for what it was done this way when you look at the history of NetFlow (in particular the various subversions within NetFlow version 8).... but it still leaves a bit of a sour taste in the mouth of us that don't necessarily want to point an IPFIX exporter at something that isn't a vendor element manager (that is coded to know what to do semantically with the IEs it receives).

If you can't find the information you're looking for (I once tried asking our technical account manager, but nothing came of that), then you'll just have to reverse engineer it from the template messages that come out and then make successive guesses..... good luck; you'll appreciate why I don't much like AppFlow for logging. These days, I would only use it for Load Balanced Virtual Servers that aren't HTTP (and thus can't include the client IP in a request header).

If memory recalls, I think Splunk had some table of these: https://github.com/splunk/ipfix, but that's out of date too and hasn't been updated in 7 years or more, and Splunk has since dropped support for IPFIX.

Reading between the lines a bit, I see that that Citrix have introduced a 'logstream' transport for AppFlow, as a alternative to IPFIX.... I can imagine why that would be attractive for various reasons, so its not unreasonably to consider AppFlow as being less-and-less useful in future.

PS. I'm really not sure what kind of template you're interested in, but if you're wanting performance-management, you may get better ROI by focussing on RUM instead.

Continue down the AppFlow path only if you are suitably desperate and if no other mechanism will suffice (submit a support ticket to see how to do what you might want using syslog logging or such).

OR, consider how you might instead get the insight you need from Netscaler MAS... though I have no experience trying that.

Hope that helps
Cameron

Badger · August 15, 2020, 1:30pm

do_persist is only called from do_register inside the catch block for :invalid_template. So if you only receive valid templates there will be no persistence.

tkuronen · August 17, 2020, 7:29am

Thanks for the answers.

Actually templates are sent also for fields below that 541. I have created also an issue IPFIX templates sent by Citrix Netscaler are not cached · Issue #192 · logstash-plugins/logstash-codec-netflow · GitHub
There is a screenshot from Wireshark where can be seen that eg. template for field 256.

...and pipeline also outputs nothing if configuring an output to a file for a test.

system · September 14, 2020, 7:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

yaauie · October 30, 2020, 7:17pm

Catch blocks in ruby are weird, and I'm not 100% sure why I used one here because the implementing code never actually throws the related symbol. Catch blocks will execute the provided code in the block, allowing a specific thrown symbol to blow through the stack to the catcher. So persistence happens unless the :invalid_template is thrown inside the block (or an exception is thrown in the block, which would blow even further through the stack).

There are tests covering persisting of the cache to disk, so I'm unsure what is failing in this situation and why.

Topic		Replies	Views
Logstash netscaler citrix input Logstash	3	505	January 2, 2023
Logstash Netflow Codec Plugin - "unsupported enterprise" error with IPFIX template Logstash	1	332	March 28, 2023
Couldnot decode flowset for first flow packet Logstash	3	1243	May 22, 2019
Netflow Plugin does not decode all data-templates Logstash	1	1184	October 27, 2017
Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received Logstash	5	4253	July 24, 2019

Can not decode Citrix Netscaler IPFIX messages

Related topics