MISP Module Polling Interval

I'm using the MISP module with Filebeat 7.5.0.

With the default configuration, it will do a single query against the MISP instance, process everything it finds, and then do nothing until the Filebeat process is restarted.

Digging through the source on GitHub, I identified an interval configuration parameter, however, at least for 7.5.0 it doesn't seem to work. It defaults to 0, and anything below 3600 (I'm assuming it's meant to be seconds, given that it's not documented anywhere and the only reference is in a configuration validation error message thrown if it's less than 3600) throws a configuration error. At 3600 it does nothing, at 7200 it does nothing, and at 14400 or above it throws a panic for 'non-positive interval for NewTicker'.

I ingest new indicators into MISP every 4 hours, and I'd ultimately like to be able to have Filebeat poll for new indicators every 4 hours.

I see there's one commit regarding a minor change to the interval code (https://github.com/elastic/beats/pull/14728/commits/c4828d842f33dc8cf7a3c632e8b3f963c333c3d2 - changing ticker := time.NewTicker(time.Duration(in.Interval) * time.Second) to ticker := time.NewTicker(in.Interval), but I don't know if this change fixes it or not, and the issue number isn't listed on the 7.5.1 or 7.5.2 Beats release notes.

I'm using the following configuration, which works and processes events as expected, but only the events that come back from the initial request on running Filebeat - it never attempts another request to the MISP server at an interval setting of 3600 (1h) or 7200 (2h).

misp:
- module: misp
threat:
enabled: true
var.interval: 3600
var.api_key: ::redacted::
var.url: https://::redacted_misp_server_url::/attributes/restSearch
var.http_method: POST
var.http_request_body: '{"published":true,"publish_timestamp":"1h"}'

Due to that bug (which should be fixed in the upcoming 7.6.0 release) the value is being multiplied by 10^9. So if you try setting var.interval: 3600ns this will probably get you a true 1h interval.

The multiplication bug (and reviewing the time documentation to see how it gets that from the constants) is interesting.

Unfortunately, due to the httpjson input's configuration validation (at least in 7.5.0):

func (c *config) Validate() error {
	if c.Interval < 3600*time.Second && c.Interval != 0 {
		return errors.New("httpjson input: interval must not be less than 3600 seconds - ")
	}
	switch strings.ToUpper(c.HTTPMethod) {
	case "GET":
		break
	case "POST":
		break
	default:
		return errors.Errorf("httpjson input: Invalid http_method, %s - ", c.HTTPMethod)
	}
	return nil
}

it doesn't permit a value below 3600s to be used for the interval.

I figured out how to get around the configuration check by finding a value above 3600s that overflowed the Int64 Duration for the ticker enough times to be positive, and then finding the exact number in nanoseconds that was the boundary between a panic for non-positive interval and no panic.

For reference, that's misp.threat.var.interval=3615561838447ns and misp.threat.var.interval=3615561838448ns

Now, incrementing 3615561838448ns by the number of seconds I wanted to have between requests e.g. misp.threat.var.interval=3615561838452ns for 5s let me do additional testing, which uncovered what appears to be another bug.

In the run() function for the httpjson input, it builds the http client, makes a request err = in.processHTTPRequest(ctx, client, req, ri) which works for the initial request, and moves the code path down into the ticker, and fires:

case <-ticker.C:
				err = in.processHTTPRequest(ctx, client, req, ri)
				if err != nil {
					return err
				}
			}

However, what happens, is it's attempting to reuse the client from the earlier call... which already gracefully terminated its TCP connection. According to tcpdump on my MISP server, the request that comes from the case <-ticker.C: block sends an ACK in response to the MISP server's last FIN-ACK (which was in response to the graceful close from the httpjson input sending its own FIN-ACK to initiate the graceful close), which causes Filebeat to throw an error failed to do http request stopping input worker - which makes sense because it encountered an illegal TCP state.

I'm not sure how the code block would need to be resolved, if it was C# I'd instantiate the http.client with a using statement to automatically dispose() the client after the request is done, and instantiate a separate one using an identical using statement within the case <-ticker.C: block, but I don't know what the Go equivalent of doing that would be.

In summary, even fixing the bug of the 10^9 multiplication of the interval value, the second request appears to terminate the input due to attempting to re-use an already closed TCP socket.

I can replicate the behavior of the ACK into the dead TCP socket/connection 100% of the time using a value of 3615561838452ns for interval in 7.5.0.

If a bug report isn't already filed for that, that should be sufficient information to create a reproduction in the built-in tests for verifying at least one successful request after interval timeframe past the initial successful request prior to instantiating the NewTicker object.

Did you try 3600s instead of 3600?

OK, this is 7.5.0, which has the bug that was fixed in 7.6. In that case, what Andrew suggested "3600ns" should work.

Golang http client's Transport typically has internal state (cached TCP connections), so clients should be reused instead of created as needed.

You might need to add this to the config:

var.ssl: |-
{
"verification_mode": "none"
}

I've been digging into this for most of the day today.

I changed from https to http to rule out TLS-related issues, and I found the point in the code where the second connection throws an error low enough down to get something useful out of it, which confirms the behavior that I was seeing.

In the interval loop, when it runs

			case <-ticker.C:
				err = in.processHTTPRequest(ctx, client, req, ri)
				if err != nil {
					return err
				}
			}

the underlying *Transport to the client successfully queues a new connection and returns it from func (t *Transport) getConn(treq *transportRequest, cm connectMethod) (pc *persistConn, err error).

It gets as far as func (pc *persistConn) readLoop() where it errors for use of closed network connection with a simultaneous error on roundTrip for a write channel error of nothing written.

From a packet capture, I see the machine running Filebeat successfully complete the first connection, and then successfully start the second connection (SYN-> SYN-ACK -> ACK) and the next packet (which is sent after the new connection is handed off) is immediately a FIN-ACK from the Filebeat machine, which correlates with the use of closed network connection error.

Unfortunately, I haven't been able to come up with any reason for the second connection to fail.

For testing purposes, I changed DisableKeepAlives: true to false, to see what affect it had on the issue, and it allows for multiple connections (on different sockets, as it was incrementing the ports on each request which I found odd since I would have thought it would have kept the initial socket open), however, only if the connections are made within a few seconds of each other, anything longer than an interval of 10-15s caused the same use of closed network connection error.

The behavior is perplexing, and I haven't identified anything that might be the source of the second connection closing unexpectedly -- the code path followed is the exact same as the first request.

I am looking into this. Will provide a fix ASAP if I identify where the problem is.

Here it is. https://github.com/elastic/beats/pull/16123

I locally patched the changes from https://github.com/elastic/beats/pull/16123/commits/d59d60168275ae73e5a8bafd4ba9b79102f47aff and I can verify the issue appears to be resolved.

I'm building an updated binary off the 7.5.0 tag with the commit for this fix, plus the fix for the interval bug to test on my production machine until the changes get merged into an official release package.

Wanted to give an update that I've been running the updated binary since yesterday, and it's working as expected.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.