How to uniquely identify data from different sources

I have different sources of data that provide the same type of logs. I would like to be able to uniquely identify where each log file came from. Here is an example of what I am talking about:

There are three sources named Source1, Source2, Source3 each has their own file directory for the data files, where logstash reads the data from. I have all of their log data, lets call them syslog.txt. Currently if I place these three folders in the directory where logstash reads the input data, they are placed into elasticsearch with no way of differentiating if one syslog.txt log came from Source1, Source2 or Source3.

Also, the sources will not always be named the same thing. This example was Source1, Source2 and Source3 but the next time they could be S1, S2 and S3 or Sun, Cloud, Stars. So it is not as simple as changing the 'type' they are read in as I don't think.

Also, I would like to be able to add and remove things by source. For example, delete all logs from Source2 from elasticsearch etc.

Is there any solutions to this or can someone point me in the right direction?

I think the folks in the Logstash forum may be more helpful, moving over there :slight_smile:

Generally, you'll need a consistent identifier that tracks the "source", regardless of where that identifier comes from (from the log itself, from the filename, from the directory path, etc).

Also, I would like to be able to add and remove things by source. For example, delete all logs from Source2 from elasticsearch etc.

Once you have a unique identifier that tracks the source, you can use Delete-By-Query to remove all of the same type. Or search for all logs from that source, etc.

There are three sources named Source1, Source2, Source3 each has their own file directory for the data files, where logstash reads the data from. I have all of their log data, lets call them syslog.txt. Currently if I place these three folders in the directory where logstash reads the input data, they are placed into elasticsearch with no way of differentiating if one syslog.txt log came from Source1, Source2 or Source3.

So the name of the directory can be used to determine the name of the source? Then have a look at the path field that Logstash populates with the full path to the file the event came from.

Also, the sources will not always be named the same thing. This example was Source1, Source2 and Source3 but the next time they could be S1, S2 and S3 or Sun, Cloud, Stars. So it is not as simple as changing the 'type' they are read in as I don't think.

Okay. So how would a human know that e.g. Source1 and S1 are actually the same thing? Once we've established that we can move on to how to make Logstash do the same thing.

Currently I have three different systems (sources) that each have their own logs. I am running the ELK stack using Vagrant. Every time I want to look at data from a different system (source) I shut down one and startup a different one. This is currently how I am keeping the logs separate from each other. Every time I want to switch between what system (source) I'm working on I don't want to have to clear and re-load all data I was working on. I want a way to be able to uniquely identify which project each log came from and be able to manipulate the data (i.e. delete one projects files without removing the others). Is there a way to do this in Logstash. Any useful tips for me? I am looking into solutions for this perhaps in Vagrant and Docker or maybe even in the Logstash configuration file. Maybe there is a way to make a different index for each project's data (i.e. index => log_datasetname)?

Thanks!

Yes, using different indexes for different projects is a good idea.

How would I code that? The project names and number of projects can change every time I run the system. Is it possible to have logstash "identify" how many different project file paths there are and create an indice for each?

Again, what would a human do? If you can describe the general algorithm we can help you translate it into a Logstash configuration.

A general use case would be someone working in customer support. There are multiple projects out in the field that produce the same logs. Each customer support technician could have multiple distinct projects they are working on from different companies. The customers send in a dump of all their logs. The technician then places each customer's logs in a folder under the name of the company. The technician would then be able to tell which log data came from where because it is under a company name. The data structure might be /data/"customer name".

Okay. As I said earlier then: Have a look at the path field that Logstash populates with the full path to the file the event came from. You can e.g. use a grok filter to extract particular path components.

Thank you @magnusbaeck, sorry for all of these questions. Would I be able to incorporate a part of the path field into an index?

Yes. Use said grok filter to extract the pieces you want to a different field and reference that field in your elasticsearch output's index option.

https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#logstash-config-field-references

1 Like

Not sure if this is supposed to go in a new post or not. If it is let me know. I am having troubles with referencing that field. I parsed out the part that I need and named it project.

	if [type] == "prn" {
		elasticsearch{
			index => project_"prn-"
			document_type => "prnformat"
			template => "/opt/ELK/logstash/template/prn_template.json"
			template_name => "prn-*"
			template_overwrite => "true"
			codec => plain{charset => "ANSI_X3.4-1968"}
		}
	}

There are different log types for these projects (logs-, prn-, etc.). I want to have the index be "project name"_"prn-" but it keeps giving me errors:

ERROR logstash.agent - Cannot create pipeline {:reason=>"Expected one of #, {, }

Thanks

I solved this. If someone could tell me how to close this topic it would be greatly appreciated!

For anyone interested in the coding of this:

	grok{
		match => { "path" => ["/opt/ELK/data/%{WORD:project}"]
		}
	}

In the output:

	if [type] == "prn" {
		elasticsearch{
			index => "%{project}_prn-"
			document_type => "prnformat"
			template => "/opt/ELK/logstash/template/prn_template.json"
			template_name => "prn-*" 
			template_overwrite => "true"
			codec => plain{charset => "ANSI_X3.4-1968"}
		}
	}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.