File beat keeps crashing

Hi Team,

We have installed filebeats to ship logs to logstash..we haven't faced any issues for first 3 months..

From last week, file beat is keep crashing for every 5 minutes.

There is no error in file beat logs and logstash logs..

There is no OS update on the server.

Note: when we run filebeats in debug mode (*) ..it's running continuously without any crash..

Kindly help me to resolve the issue..it's in Production

Regards,
Mugil

Could you share the logs you get on Info level? Also what version are you using? Can you share your config?

Hi Ruflin,

Please find the logs and config below.
ELK version - 5.5.1
Configuration :
[tibco@ELKUTILITY filebeat-5.5.1-linux-x86_64]$ vi filebeat.yml

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.full.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

#=========================== Filebeat prospectors =============================

filebeat.prospectors:

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.
- input_type: log

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
   - /app/Tibco_Logs/*/*/*/*/*/*.txt
   - /app/Tibco_Logs/*/*/*/*/*.txt
   - /app/Tibco_Logs/*/*/*/*.txt
   - /app/Tibco_Logs/*/*/*.txt
   - /app/Tibco_Logs/*/*.txt
 
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ["^DBG"]

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ["^ERR", "^WARN"]

   # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: [".gz$"]

  # Optional additional fields. These field can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  ### Multiline options

  # Mutiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  multiline.pattern: ^(Request\[|Response\[|ErrorResponse\[)

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  multiline.negate: true

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  multiline.match: after

  #filebeat.spool_size: 2048
  #filebeat.publish_async: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

#================================ Outputs =====================================

# Configure what outputs to use when sending the data collected by the beat.
# Multiple outputs may be used.

#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  #hosts: ["localhost:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

  #----------------------------- Logstash output --------------------------------
output.logstash:

 # The Logstash hosts
  hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: critical, error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

Logs :

2018-07-01T17:13:42+05:30 INFO Stopping filebeat
2018-07-01T17:13:42+05:30 INFO Stopping Crawler
2018-07-01T17:13:42+05:30 INFO Stopping 1 prospectors
2018-07-01T17:13:42+05:30 INFO Prospector outlet closed
2018-07-01T17:13:42+05:30 INFO Prospector outlet closed
2018-07-01T17:13:42+05:30 INFO Prospector channel stopped because beat is stopping.
2018-07-01T17:13:42+05:30 INFO Prospector outlet closed
2018-07-01T17:13:42+05:30 ERR Harvester could not be started on new file: /app/Tibco_Logs/06-21-2018.txt, Err: prospector outlet closed
2018-07-01T17:13:42+05:30 INFO Scan aborted because prospector stopped.
2018-07-01T17:13:42+05:30 INFO Prospector ticker stopped
2018-07-01T17:13:42+05:30 INFO Stopping Prospector: 158304222499454466
2018-07-01T17:13:42+05:30 INFO Reader was closed: /app/Tibco_Logs/06-27-2018.txt. Closing.
2018-07-01T17:13:42+05:30 INFO Reader was closed: /app/Tibco_Logs/06-27-2018.txt. Closing.
2018-07-01T17:13:42+05:30 INFO Reader was closed: /app/Tibco_Logs/06-25-2018.txt. Closing.
2018-07-01T17:13:42+05:30 INFO Reader was closed: /app/Tibco_Logs/06-22-2018.txt. Closing.
2018-07-01T17:13:42+05:30 INFO Reader was closed: /app/Tibco_Logs/06-25-2018.txt. Closing.
2018-07-01T17:13:42+05:30 INFO Crawler stopped
2018-07-01T17:13:42+05:30 INFO Stopping spooler
2018-07-01T17:13:42+05:30 INFO Stopping Registrar
2018-07-01T17:13:42+05:30 INFO Ending Registrar
2018-07-01T17:13:42+05:30 INFO Total non-zero values:  filebeat.harvester.closed=3099 filebeat.harvester.started=3099 libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.read_bytes=24 libbeat.logstash.publish.write_bytes=10255 libbeat.publisher.published_events=970 registrar.writes=1
2018-07-01T17:13:42+05:30 INFO Uptime: 3.21955546s
2018-07-01T17:13:42+05:30 INFO filebeat stopped.

Kindly help us to resolve the issue.
Regards,
Mugil

What is in the logs before it stops? Looking at the logs it seems like something external is sending a trigger to Filebeat to stop. In the above case it seems Filebeat was only running for 3s, best share the full log.

Hi Ruflin..

Please find below the entire filebeats log..

nohup.out log :

2018/07/25 16:40:23.700539 log.go:91: INFO Harvester started for file: /app/Tibco_Log/07-23-2018.txt
2018/07/25 16:40:23.700706 log.go:91: INFO Harvester started for file: /app/Tibco_Logs/07-21-2018.txt
2018/07/25 16:40:23.700873 log.go:91: INFO Harvester started for file: /app/Tibco_Logs/07-25-2018.txt
2018/07/25 16:40:23.701034 log.go:91: INFO Harvester started for file: /app/Tibco_Logs/07-21-2018.txt
2018/07/25 16:40:23.701207 log.go:91: INFO Harvester started for file: /app/Tibco_Logs/07-24-2018.txt
Killed

Note : Not getting any error message in nohup file

filebeat log :

18-07-01T17:10:50+05:30 INFO Home path: [/app/ELK/filebeat-5.5.1-linux-x86_64] Config path: [/app/ELK/filebeat-5.5.1-linux-x86_64] Data path: [/app/ELK/filebeat-5.5.1-linux-x86_64/data] Logs path: [/app/ELK/filebeat-5.5.1-linux-x86_64/logs]
2018-07-01T17:10:50+05:30 INFO Setup Beat: filebeat; Version: 5.5.1
2018-07-01T17:10:50+05:30 INFO Max Retries set to: 3
2018-07-01T17:10:50+05:30 INFO Activated logstash as output plugin.
2018-07-01T17:10:50+05:30 INFO Publisher name: ELKUTILITY
2018-07-01T17:10:50+05:30 INFO Flush Interval set to: 1s
2018-07-01T17:10:50+05:30 INFO Max Bulk Size set to: 2048
2018-07-01T17:10:50+05:30 INFO filebeat start running.
2018-07-01T17:10:50+05:30 INFO Registry file set to: /app/ELK/filebeat-5.5.1-linux-x86_64/data/registry
2018-07-01T17:10:50+05:30 INFO Loading registrar data from /app/ELK/filebeat-5.5.1-linux-x86_64/data/registry
2018-07-01T17:10:50+05:30 INFO States Loaded from registrar: 0
2018-07-01T17:10:50+05:30 INFO Loading Prospectors: 1
2018-07-01T17:10:50+05:30 INFO Stopping Crawler
2018-07-01T17:10:50+05:30 INFO Stopping 0 prospectors
2018-07-01T17:10:50+05:30 INFO Crawler stopped
2018-07-01T17:10:50+05:30 INFO Stopping spooler
2018-07-01T17:10:50+05:30 INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2018-07-01T17:10:50+05:30 INFO Metrics logging every 30s
2018-07-01T17:10:50+05:30 INFO Starting Registrar
2018-07-01T17:10:50+05:30 INFO Start sending events to output
2018-07-01T17:10:50+05:30 INFO Stopping Registrar
2018-07-01T17:10:50+05:30 INFO Ending Registrar
2018-07-01T17:10:50+05:30 INFO Total non-zero values: registrar.writes=1
2018-07-01T17:10:50+05:30 INFO Uptime: 9.079178ms
2018-07-01T17:10:50+05:30 INFO filebeat stopped.
2018-07-01T17:10:50+05:30 CRIT Exiting: Error in initing prospector: No paths were defined for prospector accessing 'filebeat.prospectors.0' (source:'filebeat.yml')

Kindly help us to resolve the issue..

Also please let us know ,how to capture the filebeat logs .

Regards,
Mugil

So it seems Filebeat stops because there are no paths defined. By current guess is that perhaps the two paths it looks for the config are not identical? Could you share the full log from your nohup output? I wonder if it has the exact same config path.

Hi Ruflin.

Please find below the nohup.out log.

We are not getting any error,but it get crashing after 5 mins..
"Request[2018-07-22T12:57:13.259+05:30]\nProcess ID: 26313399\nData: \u003cRequest xmlns="http://www.Abc.com/schemas/COMVIVA/SharedResources/Schemas/ComViva_Req.xsd"\u003e\n\u003cREQUEST_CHANNEL\u003eICF\u003c/REQUEST_CHANNEL\u003e\n\u003cSERVICEID\u003eGETCRN\u003c/SERVICEID\u003e\n\u003cREQUEST_ID\u003eMC-766729525360\u003c/REQUEST_ID\u003e\n\u003cREQUEST_DATE\u003e12345\u003c/REQUEST_DATE\u003e\n\u003cMOBILE_NO\u003e\u003c/MOBILE_NO\u003e\n\u003cCRN_NO\u003e\u003c/CRN_NO\u003e\n\u003cPAN_NO\u003eSCHM-SALARY-99063250-165010176650-STEDG-NB\u003c/PAN_NO\u003e\n\u003c/Request\u003e\n",
"offset": 3584842,
"source": "/app/Tibco_Logs/Tibco2/Success_log/data/tibco_logs/ComViva/logs/CRN_Details/Req/07-22-2018.txt",
"type": "log"
}
2018/07/25 16:26:25.807149 client.go:214: DBG Publish: {
"@timestamp": "2018-07-25T16:22:15.067Z",
"beat": {
"hostname": "ELKUTILITY",
"name": "ELKUTILITY",
"version": "5.5.1"
},
"input_type": "log",
"message": "Request[2018-07-24T09:31:16.019+05:30]\nProcess ID: 2411684\n\u003c?xml version="1.0" encoding="UTF-8"?\u003e\n\u003ccibil_indv_dedup\u003e\u003csourceappcode\u003eLOS\u003c/sourceappcode\u003e\u003cRqUID\u003e93528122\u003c/RqUID\u003e\u003cmessage_type_cd\u003eKB015\u003c/message_type_cd\u003e\u003cenquiry_purpose\u003e05\u003c/enquiry_purpose\u003e\u003cenquiry_amount\u003e7500000\u003c/enquiry_amount\u003e\u003cfirst_name\u003eRASHMI\u003c/first_name\u003e\u003clast_name\u003eGUPTA\u003c/last_name\u003e\u003cdate_of_birth\u003e1959-11-22T00:00:00+05:30\u003c/date_of_birth\u003e\u003cgender\u003e2\u003c/gender\u003e\u003caddr1\u003eD-243 D-BLOCK ASHOK VIHAR PHASE\u003c/addr1\u003e\u003caddr2\u003eI DELHI 110052\u003c/addr2\u003e\u003cstate\u003e07\u003c/state\u003e\u003cpin_code\u003e110052\u003c/pin_code\u003e\u003cres_phone_no\u003e09811046545\u003c/res_phone_no\u003e\u003cpan\u003eAFJPG7461F\u003c/pan\u003e\u003cvoter_id\u003eJPF0736736\u003c/voter_id\u003e\u003caccount_no\u003e68585878\u003c/account_no\u003e\u003cscores_request\u003e06\u003c/scores_request\u003e\u003crecord_id\u003e68585878\u003c/record_id\u003e\u003c/cibil_indv_dedup\u003e\n"\n"",
"offset": 5238,
"source": "/app/Tibco_Logs/Tibco1/Success_log/data/tibco_logs/CIBIL/logs/GetCibilReq/07-24-2018.txt",
"type": "log"
}
2018/07/25 16:26:25.807193 client.go:214: DBG Publish: {
"@timestamp": "2018-07-25T16:22:15.071Z",
"beat": {
"hostname": "ELKUTILITY",
"name": "ELKUTILITY",
"version": "5.5.1"
},
"input_type": "log",
"message": "Request[2018-07-23T21:30:02.552+05:30]\nProcess ID: 16520\nData: \n\u003c?xml version="1.0" encoding="UTF-8"?\u003e\n\u003cresultSet/\u003e\n",
"offset": 15719,
"source": "/app/Tibco_Logs/Tibco1/Success_log/data/tibco_logs/DishTv/Logs/Generic_WS_Rev/Req/07-23-2018.txt",
"type": "log"
}
2018/07/25 16:26:25.807239 client.go:214: DBG Publish: {
"@timestamp": "2018-07-25T16:22:15.071Z",
"beat": {
"hostname": "ELKUTILITY",
"name": "ELKUTILITY",
"version": "5.5.1"
},
"input_type": "log",
"message": "Response[2018-07-24T09:17:03.907+05:30]\nProcess ID: 164385\nData:\n\u003c?xml version="1.0" encoding="UTF-8"?\u003e\n\u003cresultSet\u003e\u003cRecord\u003e\u003cFORACID\u003e2112235837\u003c/FORACID\u003e\u003cACCT_LOB\u003e0063\u003c/ACCT_LOB\u003e\u003c/Record\u003e\u003c/resultSet\u003e\n",
"offset": 410,
"source": "/app/Tibco_Logs/Tibco2/Success_log/data/tibco_logs/Get_Finacle_LOB/Logs/Finacle_Res/07-24-2018.txt",
"type": "log"
}
2018/07/25 16:26:25.807294 client.go:214: DBG Publish: {
"@timestamp": "2018-07-25T16:22:15.072Z",
"beat": {
"hostname": "ELKUTILITY",
"name": "ELKUTILITY",
"version": "5.5.1"
},
"input_type": "log",
"message": "Response[2018-07-25T11:12:54.365+05:30]"\n\u003c?xml version="1.0" encoding="UTF-8" ?\u003e\u003cTBS04\u003e\u003csourceappcode\u003eLMS\u003c/sourceappcode\u003e\u003cRqUID\u003e15324973734182590403\u003c/RqUID\u003e\u003cmessage_type_cd\u003eTBS04\u003c/message_type_cd\u003e\u003cresponse\u003e\u003cerrorcode\u003e01\u003c/errorcode\u003e\u003cerrordesc\u003eSUCCESS\u003c/errordesc\u003e\u003cleadno\u003eCV1603120\u003c/leadno\u003e\u003cprospectNumber\u003eCV10013990\u003c/prospectNumber\u003e\u003cassetIdDetails\u003e \u003cdetails\u003e\u003cUniqueToken\u003e0\u003c/UniqueToken\u003e\u003cAssetIdExst\u003e1441426\u003c/AssetIdExst\u003e\u003c/details\u003e\u003c/assetIdDetails\u003e\u003c/response\u003e\u003c/TBS04\u003e\n""",
"offset": 3289,
"source": "/app/Tibco_Logs/Tibco1/Success_log/data/tibco_logs/LMS/logs/LeadCreation_SmartApp/RESPONSE/07-25-2018.txt",
"type": "log"
}
2018/07/25 16:26:25.807351 client.go:214: DBG Publish: {
"@timestamp": "2018-07-25T16:22:15.080Z",
"beat": {
"hostname": "ELKUTILITY",
"name": "ELKUTILITY",
"version": "5.5.1"
},
"input_type": "log",
"message": "Request[2018-07-25T09:55:27.715+05:30]"\n\u003cTBS013\u003e\u003csourceappcode\u003eLMS\u003c/sourceappcode\u003e\u003cRqUID\u003e15324927269522588356\u003c/RqUID\u003e\u003cProspectNo\u003eLV10013048\u003c/ProspectNo\u003e\u003cmessage_type_cd\u003eTBS013\u003c/message_type_cd\u003e\u003cloginDetail\u003e\u003cuserid\u003eKMBL88821\u003c/userid\u003e\u003cpassword\u003eopJ3tFoIJeaJYqHRJL1Tuw==\u003c/password\u003e\u003corganization\u003eKB001\u003c/organization\u003e\u003c/loginDetail\u003e\u003cleadDtl\u003e\u003cATU17\u003eNON\u003c/ATU17\u003e\u003cATU54\u003e700\u003c/ATU54\u003e\u003cATU03\u003e44\u003c/ATU03\u003e\u003cATGUR\u003eExternal\u003c/ATGUR\u003e\u003cTSA87\u003e0\u003c/TSA87\u003e\u003cTSA74\u003e0\u003c/TSA74\u003e\u003cTSA73\u003e0\u003c/TSA73\u003e\u003cTSA90\u003e339372672\u003c/TSA90\u003e\u003c/leadDtl\u003e\u003c/TBS013\u003e\n""",
"offset": 2565,
"source": "/app/Tibco_Logs/Tibco2/Success_log/data/tibco_logs/LMS/logs/EligibilityComputation/REQUEST/07-25-2018.txt",
"type": "log"
}
2018/07/25 16:26:25.811675 output.go:109: DBG output worker: publish 1032 events

Kindly help
Regards,
Mugilvannan

Hi Ruflin..

Is there anyway to control the flow limit of log flow from filebeat to logstash?

The above log only looks like an except again and not the full one. Could you upload the full one to a gist for example and share the link here?

For your last question: Is it somehow related to the above? If not, best open a new discuss topic to not diverge the discussion.

Hi Ruflin,

Sorry for the delay..

Here you can find the full logs in the below link..

Note: filebeat is stopping without any error message..

Kindly check and let us know..

Regards,
Mugil

Something is really odd here. The log you shared ends with:

2018/07/25 16:25:56.953275 client.go:214: DBG  Publish: {
  "@timestamp": "2018-07-25T16:22:14.881Z",
  "beat": {
    "hostname": "ELKUTILITY",
    "name": "ELKUTILITY",
    "version": "5.5.1"
  },
  "input_type": "log",

  "offset": 2040,
  "source": "/app/Tibco_Logs/Tibco1/Success_log/data/tibco_logs/EPBG/logs/EPBGPull/Request/07-24-2018.txt",
  "type": "log"
}
2018/07/25 16:25:56.953383 client.go:214: DBG  Publish: {

The above means Filebeat didn't even finish writing a full log message which I'm not sure how that could happen. How many lines does your local log file have? 16966 I get in the shared log.

Hi @mugil1988,

The logs that you have shared on 4/13 conversation, doesn't matching with the full log that you have shared on 6/13 . So, kindly verify it.

@ruflin

We have same number of lines in the local log also.

Filebeat is getting stopped without any error written in local log..

Any other possibilities for failure?

Regards,
Mugil

If the above is the log that Filebeat is creating then it does not seem like Filebeat is crashing but rather something is killing Filebeat, something like a power failure or a security tool. If Filebeat would be crashing, we would see at least a panic or an error in the log and it would not stop in the middle of writing an event to the log.

You have any security tools which could do something like this?

Hi Ruflin,

Please find below the error in nohup.out file...

2018/08/13 16:15:41.389563 output.go:109: DBG output worker: publish 1941 events
2018/08/13 16:18:28.086945 log.go:116: INFO File is inactive: /app/Tibco_Logs/8-13-2018.txt. Closing because close_inactive of 5m0s reached.
2018/08/13 16:18:28.825797 log.go:116: INFO File is inactive: /app/Tibco_Logs/Tibco1/08-10-2018.txt. Closing because close_inactive of 5m0s reached.
2018/08/13 16:18:33.503804 metrics.go:39: INFO Non-zero metrics in the last 30s: filebeat.harvester.open_files=1549 filebeat.harvester.running=1551 filebeat.harvester.started=1551 libbeat.logstash.call_count.PublishEvents=2 libbeat.logstash.publish.read_bytes=60 libbeat.logstash.publish.write_bytes=638073 libbeat.logstash.published_and_acked_events=423 libbeat.publisher.published_events=2364 publish.events=12288 registrar.states.current=11795 registrar.states.update=12288 registrar.writes=6
2018/08/13 16:18:57.379439 log.go:116: INFO File is inactive: /app/Tibco_Logs/Tibco1/Success_log/08-06-2018.txt. Closing because close_inactive of 5m0s reached.
2018/08/13 16:18:57.366115 log.go:116: INFO File is inactive: /app/Tibco_Logs/Tibco1/Veritas/08-10-2018.txt. Closing because close_inactive of 5m0s reached.
2018/08/13 16:19:00.379081 log.go:116: INFO File is inactive: /app/Tibco_Logs/Tibco1/Res/08-10-2018.txt. Closing because close_inactive of 5m0s reached.

[1] Killed nohup ./filebeat -e -c filebeat.yml -d publish

Kindly help us to resolve the issue...

Regards,
Mugil

Did you check if anything on your machine could potentially kill filebeat?

Hi Ruflin,

Yes I have checked.. nothing killing filebeat....

How did you check that nothing is killing Filebeat?

The problem is if Filebeat is stopped or crashed, it would log out what happened. But in the above it seems it is killed in the middle of the process which Filebeat cannot do on it's own AFAIK.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.