I tried sending logs using filebeat via logstash to Elasticsearch.
Here is my configuration of filebeat:
- type: log
json.keys_under_root: true
json.overwrite_keys: true
json.add_error_key: true
json.expand_keys: true
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- P:\dns\*.json
#- /var/log/*.log
#- c:\programdata\elasticsearch\logs\*
The timestamp JSON keypair having EPOCH times need conversation to human readable time stamp. I have tried few configurations in mutate but the data always ends up under the original JSON key with EPOCH time. -- Do I need changes in my filebeat configuration with mutate under logstash
How do replace JSON key value title? For example instead of "name":"*.wideu.iii.com" I would want "dns_request":"*.wideu.iii.com"
Is it possible to reindex data and take the value field which holds the IP addresses and run them against GeoIP DB? Or will I need to reingest the data?
The current set up is:
I'm running filebeat on my laptop which is in running state from last 3 days running filebeat and forwarding the file to Logstash which is running on a Raspberry pi.
The ingestion rate is ~1200 event per second. I want to now introduce a filter in logstash that will carry out a network call and fetch data which I feel will slow down the ingestion to around 20 to 30 EPS given that it is on my home internet.
Here are my questions
As of now ~50% of the file with ~126,796,781 records has been ingested. How will any change in ingestion work for the entirety of the record? As in - I can shutdown filebeat now which is reading the file & make changes to logstash pipeline & start filebeat again. But that will only cater to records being indexed henceforth - How do work on the ~126,796,781 that have been indexed? What is the correct API to invoke reindexing while ensuring time is not wasted processing 50% of the file that will have the new logstash pipeline.
Further to point 1 - Is there a way I can do this without affecting current ingestion rate by making a this process start from the very first event that is indexed and enhance the current index as in a separate pipeline and not part of the current one?
I'm sorry if this sounds confusing, please do let me know if I need to reword it. Lots going on
When you said reindex I got confused.
here you are asking if you update your config file and add one more field called geoip. what will happen to old record which are already ingested?
if that is the question then that old record will not have geoip field. and new record will have geoip field.
You actually do not have document_id setup in your ouput that means each document(record) in Elasticsearch has randomly created id for that record "_id"
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.