New to Elastic/LogStash and have a few questions

Hi all,

Just started using Elastic Stack after hearing rave reviews from friends in the security industry. I used the free trial thing just to have a prod around the interface before setting my own instance up.

Got a few questions I'm hoping that someone here can answer:

  1. Is there a way of integrating LogStash into the Kibana GUI? I've looked through documents and videos and I can't find a single thing talking about this anywhere. It's OK if not, I just want to understand so that I can stop looking. :smiley: The only reason why I ask is because the interface on the 'hosted' platform has a section for Log Stash under Management -> Stack Management. Perhaps I've missed something, but if someone can either confirm/deny and then point me at the correct resources, I'd appreciate it.

  2. Integration of scripting/API collection. Part of my use case is to collect OSINT from forums, twitter, and such. Obviously Twitter has it's own specific integration, but if I wanted to scrape data from a forum via the use of a script - how would I achieve this? The script is being worked on by a colleague, so I don't have it to hand, but it is being written in Python with some Regex. The output of the script will be .JSON - my reading leads me to believe that this is the best format for ingestion into LogStash/Elasticsearch. If this isn't correct, please correct me here also.

  3. Ingesting the output of this data into ElasticSearch is then the next step. I assume that any entry will require a key field of some description - which will usually be a date/timestamp. If we ingest the output of the script mentioned in step 2, say once per day, it will just create a series of new entries and not overwrite the old data? Just thinking for trend analysis.

Thanks for any help.

Best,

  1. There is a centralized pipeline management within Kibana but it requires a license.

  2. If the forums off an API you can use the http_poller plugin. If the only option is to scrape the site then you have to build that portion on your own and then you can ingest the results using the file input which will read the file in whatever format you save after scraping.

  3. I don't know the use case you need but the safe way is to not overwrite until you know what you need exactly. Every post/thread should come with a timestamp and then you can track when posts were made. This really depends on how you intend to analyze the data. The other option is if every thread/post has a unique ID you can store that in Elastic as the ID and then you can reference that one to update. But when you update a record the previous versions are gone. So make sure you don't need those if you are going that route.

Hey, great response thank you for your time.

  1. I can live without it. It was just a 'nice to have' feature.

  2. Some of them do and that was another question I had as there were a few HTTP-type plugins and this one seemed the most like what I needed. 2 answers for 1 question.

  3. The output of the scripts produce a .JSON file every 6 hours or so (one per data source). What I want to see is, Logstash/Elasticsearch check the file for rotation (i.e. when it gets updated), and then ingest the file without overwriting the old data.

So if an IP address, URL, file hash changes within the file, then the previous IP, URL, or file hash will not be overwritten, but just stored as 'this is what was captured at this time'.

Basically, I'd like each .JSON file to be a new log entry into the same indices and not overwrite previous data so we can do historical searches. i.e. "How many times did 'x.x.x.x' IP address appear in the blog over the last 2 months", or "what was the change in rate of this file hash being mentioned across all forums in the last 2 weeks".

I hope that makes sense, but maybe I'm not explaining this too well. :smile:

I come from an ArcSight/LogRhythm background so that's probably where my lack of understanding is coming from.

If you don't set a custom id for your document then every time elasticsearch ingest a document, it will set an unique id and your original document will not be overwritten/updated. Just don't use the option document_id in your elasticsearch output and you will have this behavior.

OK, cool.

So if I just tell Elasticsearch to look for a certain document name in the input configuration, don't set the document_idfeature, it'll just ingest the document. Will it ingest the document into the same indices or is this part of the input configuration?

How do I get it to monitor a document for updates and then run an ingest task every time it changes? Is that possible?

I'm also assuming that all of this is possible via Elasticsearch and I don't even need Logstash?

Just checking if anyone can answer the rest of this post for me please?

It all depends on your configuration, if you don't use the document_id in the logstash output to elasticsearch, elasticsearch then will choose a random unique id and will ingest in the configured index.

For example, the configuration below will always index in an index named your-index.

output {
      elasticsearch {
        hosts => ["elasticsearch:9200"]
        index => "your-index"
      }
    }

To use logstash or not also depends on what you want to do, what kinds of transformation or data enrichment you want to do. You don't need logstash to ingest data into elasticsearch, a simple REST request using a python script or even curl could do that, you can also use filebeat to read the logs and send directly to elasticsearch, but again, everything depends on your use case.

It would be better if you shared or opened a new question with an practical example of what you want to do, a sample of your data, how you are planning to send it to elasticsearch, if you need to make some kind of transformations and what you expect to see.