Filebeat error: log/prospector.go:437

Hi, anyone from the team help me fix this issue.

2018-07-12T07:11:34.266-0400 ERROR log/prospector.go:437 Harvester could not be started on new file: d:\logs\azure\data\SignInEvent_accenture.onmicrosoft.com_2018-07-03T03-59-00Z_2018-07-03T06-59-00Z_336.json, Err: Error setting up harvester: Harvester setup failed. Unexpected file opening error: Failed opening d:\logs\azure\data\SignInEvent_accenture.onmicrosoft.com_2018-07-03T03-59-00Z_2018-07-03T06-59-00Z_336.json: Error creating file '%!s(*uint16=0xc065c681c0)': The system cannot find the file specified.

`

Thanks in advance!

Loos like the file above does not exist anymore, means was deleted during the time Filebeat tried to open it?

What do you think the reasons are why filebeat opened the outdated files?
I mean, in my case, the signInDateTime is July 2 - July 9, then filebeat will only open the file at July 12.

Normally this happens in a race condition. So something must have moved the file. Can you reproduce it or only saw it once? If you can reproduce it please share the full log and your full config.

As I investigate, since we have a big amount of data, we have a daily cleanup of logs that is older than 7 days. I guess when filebeat is still pushing logs to elasticsearch, the file has been removed. So, another question here is, how fast and how much data that filebeat push logs to elasticsearch?

> filebeat.prospectors:

- type: log
  enabled: true
  paths:
    - d:/logs/azure/data/*
  fields:
    tokenSource: "AZURE"
  fields_under_root: true
  json.keys_under_root: true
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

output.elasticsearch:
  enabled: true
  hosts: [""]
  indices:
        - index: "eso_token_azure-%{+yyyy.MM}"
          when.contains:
                source: "d:\\logs\\azure\\data"
  username: ""
  password: ""

`

How fast really depends on your system. If you look at the Filebeat logs you will see some stats printed every 20 seconds on the Info level. There you can also see how many events are there.

Can you share a bit more on your log / filebeat setup? Is there a Filebeat instance on each edge node? Or is the above directory a shared directory on a network drive? How to logs arrive in this directory?

Here's the log I can give you,

2018-07-12T07:11:34.266-0400	ERROR	log/prospector.go:437	Harvester could not be started on new file: d:\logs\azure\data\SignInEvent_accenture.onmicrosoft.com_2018-07-03T03-59-00Z_2018-07-03T06-59-00Z_336.json, Err: Error setting up harvester: Harvester setup failed. Unexpected file opening error: Failed opening d:\logs\azure\data\SignInEvent_accenture.onmicrosoft.com_2018-07-03T03-59-00Z_2018-07-03T06-59-00Z_336.json: Error creating file '%!s(*uint16=0xc065c681c0)': The system cannot find the file specified.
2018-07-12T07:11:34.351-0400	ERROR	log/prospector.go:437	Harvester could not be started on new file: d:\logs\azure\data\SignInEvent_accenture.onmicrosoft.com_2018-07-04T00-59-00Z_2018-07-04T03-59-00Z_118.json, Err: Error setting up harvester: Harvester setup failed. Unexpected file opening error: Failed opening d:\logs\azure\data\SignInEvent_accenture.onmicrosoft.com_2018-07-04T00-59-00Z_2018-07-04T03-59-00Z_118.json: Error creating file '%!s(*uint16=0xc065c680e0)': The system cannot find the file specified.
2018-07-12T07:11:48.852-0400	INFO	log/harvester.go:216	Harvester started for file: d:\logs\azure\data\SignInEvent_accenture.onmicrosoft.com_2018-07-05T21-59-00Z_2018-07-06T00-59-00Z_230.json
2018-07-12T07:11:52.710-0400	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":1365671,"time":1365671},"total":{"ticks":585163483,"time":585163483,"value":585163483},"user":{"ticks":583797812,"time":583797812}},"info":{"ephemeral_id":"057e4e38-b40c-4c0f-b635-8c066755b203","uptime":{"ms":228270039}},"memstats":{"gc_next":189964208,"memory_alloc":146033944,"memory_total":63320870562560,"rss":-12288}},"filebeat":{"events":{"active":328,"added":1137,"done":809},"harvester":{"closed":3,"open_files":570,"running":569,"started":2}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":1105,"batches":29,"total":1105},"read":{"bytes":28204},"write":{"bytes":1212264}},"pipeline":{"clients":1,"events":{"active":683,"filtered":5,"published":1132,"total":1137},"queue":{"acked":815}}},"registrar":{"states":{"current":61672,"update":1119},"writes":3}}}}
2018-07-12T07:11:53.046-0400	INFO	log/harvester.go:241	File is inactive: d:\logs\azure\data\SignInEvent_accenture.onmicrosoft.com_2018-07-06T03-59-00Z_2018-07-06T06-59-00Z_31.json. Closing because close_inactive of 5m0s reached.

Could you please also elaborate on this question?

For the metrics: Your registry is pretty large. You should use some clean_* options to get rid of inactive or removed files.

I actually don't understand, can you rephrase the question?

We're still having this issue, that filebeat data upload to elasticsearch are too slow.

I would like to know about your architecture. How many Filebeat instances do you have? On how many servers are they running? Do you use a central shared directory for all files? How do the files end up in this directory? How many events per second do you expect?

We do have one filebeat instance per server. But we are having this issue on one specific server/filebeat.

This is our filebeat configurations:

filebeat.prospectors:

- type: log
  enabled: true
  paths:
    - d:/logs/azure/data/*
  fields:
    tokenSource: "AZURE"
  fields_under_root: true
  json.keys_under_root: true

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.kibana:
  host: "https://"

output.elasticsearch:
  enabled: true
  hosts: ["https://"]
  indices:
        - index: "eso_token_azure-%{+yyyy.MM}"
          when.contains:
                source: "d:\\logs\\azure\\data"
  username: ""
  password: ""

And this is the sample error message we got:
2018-07-12T02:26:33.205-0400 ERROR log/prospector.go:437 Harvester could not be started on new file: d:\logs\azure\data\SignInEvent_accenture.onmicrosoft.com_2018-07-03T06-59-00Z_2018-07-03T09-59-01Z_1072.json, Err: Error setting up harvester: Harvester setup failed. Unexpected file opening error: Failed opening d:\logs\azure\data\SignInEvent_accenture.onmicrosoft.com_2018-07-03T06-59-00Z_2018-07-03T09-59-01Z_1072.json: Error creating file '%!s(*uint16=0xc04c3d2000)': The system cannot find the file specified.

We have a daily cleanup of logs that is older than 7 days.
Our data for this specfic filebeat is from Microsoft Azure AAD that pushes data with 2hrs gap from the actual signInDateTime.

Ok, so only the logs from Azure Active Directory (?) have this issue? Now it gets interesting.

Can you share more details on what you mean by "pushes data with 2hrs gap"? Does that mean it's not constantly logging line by line?

Azure Active Directory uploads data every 2hrs, then so we pick them up every 3 hours to make sure that we got all the data. But that doesn't the issue.

You may take a look on the sample data we have in Kibana

{
  "_index": "eso_token_azure-2018.07",
  "_type": "doc",
  "_id": "15dtvmQBMHIrkXsoFd90",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2018-07-21T19:59:20.531Z",
    "tokenSource": "AZURE",
    "userId": "d50f34bd-3269-4b43-bb2b-48a689d91d33",
    "signinErrorCode": 0,
    "signinDateTime": "2018-07-18T08:18:06.4437088Z",
    "id": "d7e1e749-f6c1-486c-b9cb-e97d81c32100",
    "offset": 290753,
    "mfaResult": "MFA requirement satisfied by claim in the token",
    "loginStatus": "Success",
    "ipAddress": "144.36.138.8",
    "dataSource": 1,
    "correlationId": "883829e7-0042-481b-add2-6d0efd5676d1",
    "mfaRequired": true,
    "source": "d:\\logs\\azure\\data\\SignInEvent_accenture.onmicrosoft.com_2018-07-18T06-59-01Z_2018-07-18T09-59-01Z_730.json",
    "userDisplayName": "Susada, Lyka",
    "appId": "38aa3b87-a06d-4817-b275-7a316988d93b",
    "location": {
      "city": "Carrollton",
      "state": "Texas",
      "country": "US"
    },
    "signinDateTimeInMillis": 1531901886443,
    "geoCoordinates": {
      "latitude": 32.88254928588867,
      "longitude": -96.93798065185547
    },
    "userPrincipalName": "lyka.susada@accenture.com",
    "deviceInformation": ";Windows;;",
    "prospector": {
      "type": "log"
    },
    "beat": {
      "name": "VW104078",
      "hostname": "VW104078",
      "version": "6.2.2"
    }
  },
  "fields": {
    "@timestamp": [
      "2018-07-21T19:59:20.531Z"
    ],
    "signinDateTime": [
      "2018-07-18T08:18:06.443Z"
    ]
  },
  "highlight": {
    "userPrincipalName": [
      "@kibana-highlighted-field@lyka.susada@/kibana-highlighted-field@@accenture.com"
    ]
  },
  "sort": [
    -9223372036854776000
  ]
}

And take note that we have a daily clean up of source files older than 7 days.

Please focus on the @timestamp and signinDateTime.
My point here is that we don't get the data in near real time maybe because of the filebeat upload.

Do you have any configuration options to recommend for my filebeat.yml ?

I've been thinking about this one, but can't decide what's the best value for it

If you require log lines to be sent in near real time do not use a very low scan_frequency but adjust close_inactive so the file handler stays open and constantly polls your files.

How do you exactly "upload" the data? This might be the issue that for some reason the file is there and shows up but Filebeat might not have access yet. Or some temp files are loaded? I have seen other issues in the past for example with rsync where files where synced into a directory but rsync keeped changing the file when Filebeat was already reading them which caused issues. The solution here was:

  • Upload all files
  • Then move the files to a directory that filebeat can read after upload is finished

For the delay: If I look at the timestamps and assuming they are correct, the delay is 3 days. I would not expect Filebeat to be so much behind in reading except that upstream cannot consume the data fast enough.

What means near real time for you? It seems the data is already delayed by at least 2 hours?

You also wrote above "pick them up every 3 hours". What do you mean by that?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.