How to "tag" from which location data is coming from?

Hello,

Not sure if this should go here or in the elastic discussion, apologies.

Lets say I got remote location A, B and C and each of them are sending their netflow, snmp and syslogs to a central ELK stack.

The remote locations won't (always) have static IPs and we might end up with hundreds of locations to monitor.

I want to make a dashboard for each location that only shows data for the location. For example my current netflow.conf just listens to all traffic coming in on one port so there is no way to filter where its coming from (all locations have the same network setup so can't use source IPs either).

What would be the best way to deal with this?

Thank you.

The easiest way is to have the sending location add a field that says where it is coming from. However I assume that isn't a possibility here.

What about using multiple inputs with separate ports? For example Location A sends syslog traffic to port 8081, Location B sends syslog traffic to port 8082, and so on. Each location will have it's own logstash input. Then you use the add_field or type setting to mark the location on your logstash input.

Input {
    syslog {
       port => 8081
       type => "LocationA"
    }
    syslog {
       port => 8082
       type => "LocationB"
    }
}

Then in your dashboard you just use your new field name to know where it all comes from.

I might have over a thousand locations so separate ports would probably result in management/security hell.

Running logstash locally at each location and then send the data to the remote Elastic server might not be such a bad idea. That way I suppose I can also filter out the information I don't need before sending it to Elastic which is good because the remote locations are on expensive metered satellite connections.

By adding a field do you mean creating a new index for it or mutating in a new field?

My current .conf looks like this;

input
type => netflow
codec => netflow

output
if [type] == netflow
elasticsearch
hosts => localhost
index => netflow-% YYYY.MM.dd

How would adding a field work? Simply changing type, if type and index to Location A, B, C etc?

Doesn't the input plugin(s) that you're using add a field that indicates the source of the event? Have you dumped the events with e.g. a stdout { codec => rubydebug } output to see what you already have?

My test setup is already feeding data intoElastic. I can see there is a host field showing the wan side IP but not all of my clients will have a static IP so I can't use that.

{
  "_index": "netflow-2017.04.20",
  "_type": "netflow",
  "_id": "AVuJ8gcVIttyAWOS",
  "_score": null,
  "_source": {
    "netflow": {
      "output_snmp": 0,
      "forwarding_status": {
        "reason": 3,
        "status": 3
      },
      "ipv4_src_host": "172.16.10.xxx",
      "in_pkts": 1,
      "ipv4_dst_addr": "172.16.10.111",
      "first_switched": "2017-05-18T05:41:58.999Z",
      "flowset_id": 257,
      "l4_src_port": 64886,
      "ipv4_dst_host": "172.16.10.xxx",
      "version": 9,
      "application_id": 0,
      "flow_seq_num": 96,
      "ipv4_src_addr": "172.16.10.xxx",
      "in_bytes": 65,
      "protocol": 17,
      "flow_end_reason": 0,
      "last_switched": "2017-05-18T05:41:58.999Z",
      "input_snmp": 12,
      "out_pkts": 1,
      "out_bytes": 65,
      "l4_dst_port": 53
    },
    "@timestamp": "2017-04-20T05:58:46.000Z",
    "@version": "1",
    "host": "xxx.xxx.xxx.xxx",
    "type": "netflow"
  },
  "fields": {
    "netflow.first_switched": [
      1495086118999
    ],
    "netflow.last_switched": [
      1495086118999
    ],
    "@timestamp": [
      1492667926000
    ]
  },
  "sort": [
    1492667926000
  ]
}

Or am I misunderstanding you?

As for the stoutput, do I put that in the output } and then run the .conf from the command line? Sorry if that seems like a dumb question but I'm very new to all this :slight_smile:

edit: Even if there is a ID field or something like that, if all locations are feeding into the same index then there would still need to be some logic in Elastic filtering different locations to something, right?

With my test setup I can make my graphs and everything in Kibana and that appears to be working fine but it relies on things such as the SUM out.bytes in a given index. I haven't seen any options to do something like select SUM out.bytes from fields where host == myip.

My test setup is already feeding data intoElastic. I can see there is a host field showing the wan side IP but not all of my clients will have a static IP so I can't use that.

Would it help to use a dns filter to resolve the IP address to a hostname?

As for the stoutput, do I put that in the output } and then run the .conf from the command line?

Yeah, that would work.

edit: Even if there is a ID field or something like that, if all locations are feeding into the same index then there would still need to be some logic in Elastic filtering different locations to something, right?

When you say "location", what do you mean exactly?

For us we use nxlog or filebeat to transfer log files and event log data. Both nxlog and filebeat allow creating new fields and removing fields prior to it being sent. So the new field could be a static field(which may not work for you), or dynamic.

For example we pull in the hostname of the machine that data was collected from. Let's say the host name is "Production.Web15.US.California". This can then be split up via GROK or regex into

{
"HostName":"Production.Web15.US.California",
"ProdState":"Production",
"serverClass:Web",
"Region":"US",
"Location":"California"
....(Insert all your original data here)
}

So you can use the same template across many different machines and it will dynamically create new fields that can help you narrow down what the machine does and where it is located. This of course all depends on you being able to derive meaningful data from a machine name. In your case you just have an IP so it may be a bit more difficult.

So the question is, how do you know what the location is? If you can't look at the record in Elasticsearch and say where it came from, how is Kibana going to be able to do it for you? Do multiple locations share the same IP? Or is there a one to one correspondence between IP's and locations? If so just use the IP for your location.

One thing that may work if you are getting a public IP is the GeoIP filter. It may be a bit rough, but it will translate a public WAN IP to a location. It's not 100% though.

And that is my question :smile:. I looked at Kibana search queries this morning and it looks like you can do something like host: "ip address" when creating new visualizations to pull records only related to that IP. I only have data from one ip in Elastic but it does appear to be working so that would solve the problem of knowing where data is originating from.

As Magnus said I could probably setup a DDNS service to reverse lookup the IP. Looks like dyn.com supports DNS records per sub domain. Though I have no experience with reverse DNS lookups so I'll need to look into that a bit more.

One problem I do have is that I can't seem to add a new field to my data for the sake of testing.

I want to add a location name field with a static value so I can test with my one IP by simply changing the static value in the .conf and make seperate data entries that way.

I looked at the documentation and tried:

filter { mutate { add_field => { "location_name" => "Location_A"

But that doesn't work. What am I doing wrong? Also, is there further documentation or maybe a book that explains how filters and their syntaxes work on a noob level?

I'm sure its very easy to figure out for somebody that knows a bit about programming but unfortunately I don't.

But that doesn't work. What am I doing wrong?

That should work just fine (assuming you have some closing braces at the end).

I can't get it working with my existing (copy pasted from the Internet) code though.

[code]filter {
mutate {
add_field => { "location_name" => "Location_A"
add_field => [ "[netflow][ipv4_dst_host]","%{[netflow][ipv4_dst_addr]}" ]
add_field => [ “[netflow][ipv4_src_host]","%{[netflow][ipv4_src_addr]}" ]
}

dns
{
action => 'replace'
reverse => "[netflow][ipv4_dst_host]"
}
dns
{
action => 'replace'
reverse => "[netflow
][ipv4_src_host]"
}
}[/code]

Doesn't work. But it works if I remove the location_name line or if I remove all other other code and only have

filter { mutate { add_field => { "location_name" => "Location_A" } } }

I tried adding a few } to the end of the existing code but that doesn't (not surprisingly) work.

What am I doing wrong?

While I'd expected multiple add_field lines to work, I'd do like this (which is also less repetitive):

filter {
  mutate {
    add_field => {
      "location_name" => "Location_A"
      "[netflow][ipv4_dst_host]" => "%{[netflow][ipv4_dst_addr]}"
      "[netflow][ipv4_src_host]" => "%{[netflow][ipv4_src_addr]}"
    }
  }
}

Doesn't work.

It's easier to help if you elaborate more. Logstash doesn't start? Logstash works but doesn't do what you want? Something else?

By the way, on

add_field => [ “[netflow][ipv4_src_host]","%{[netflow][ipv4_src_addr]}" ]

one of the quotes is incorrect (it's a curly instead of a stright one). That might upset Logstash.

Sorry, it trows a config error in the logs.

[2017-04-21T18:06:55,615][ERROR][logstash.agent           ] Cannot load an invalid configuration {:reason=>"Expected one of #, => at line 16, column 27 (byte 187) after filter {\nmutate {\nadd_field => {\n\"location_name\" => \"location_a\"\n\"[netflow][ipv4_dst_host]\""}

Full .conf

input {
   udp {
     port => 9995
	type => "netflow"
	codec => netflow {
	versions => [9]
}
}
}


filter {
mutate {
add_field => {
"location_name" => "location_a"
"[netflow][ipv4_dst_host]","%{[netflow][ipv4_dst_addr]}" 
"[netflow][ipv4_src_host]","%{[netflow][ipv4_src_addr]}"
}
}
}

dns
{
action => 'replace'
reverse => "[netflow][ipv4_dst_host]"
}
dns
{
action => 'replace'
reverse => "[netflow][ipv4_src_host]"
}
}


output {
if [type] == "netflow" {
elasticsearch {
hosts => localhost
index => "netflow-%{+YYYY.MM.dd}"
}
}
}

There's one closing braces too much after your add_field stanza. Hint: Indent your configuration file like I did. Then it's much easier to see if you have the right number of braces.

Magnus I'm no programmer, I'm not sure I understand.

I see three opening brackets, shouldnt there be three to close it? I tried removing on of the three at the end but no joy :frowning:

The dns filter should be inside the filter block. One of the three opening braces opens the filter block, so if you put three closing braces after add_field you're outside the filter block. With proper indentation this is very visible:

filter {
  mutate {
    add_field => {
      "location_name" => "location_a"
      "[netflow][ipv4_dst_host]","%{[netflow][ipv4_dst_addr]}" 
      "[netflow][ipv4_src_host]","%{[netflow][ipv4_src_addr]}"
    }
  }
}
dns {
  action => 'replace'
  reverse => "[netflow][ipv4_dst_host]"
}

The log shows an error after "[netflow][ipv4_dst_host]". Even if I use only two } at the end (I suppose this closes the add field and mutate but leaves open the filter) it still trows that error.

Oh, now I see the problem. You didn't copy/paste my example correctly. You have too many commas and too few =>.

Thanks a lot, its working now :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.