Log_ecs_reformatting not reformatting all logfiles

Hi all,

I have upgraded to the latest Java agent version 1.24.0 today and I wanted to try the log_ecs_reformatting feature in our testsystem. It kinda works but it seems to ignore the logback over Slf4j logs:
I have a tomcat container running with different applications which create a total of 14 logfiles(3 Log4j logs, 11 logback over slf4j). The log4j logfiles are perfectly created in my log_ecs_reformatting_dir directory but the logback logfiles seem to be ignored. The applications are running and transactions are visible in Kibana so it seems the applications are instrumented correctly. Am I missing something or is this a bug in the implementation?

Things that I found interesting in the startup logs(full log and config below):

  • ByteBuddy fails to instrument a Spring enhanced class because it seems a pre Java 4 class file but I cannot believe that as we use Spring 4
    co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener -[redacted]$$EnhancerBySpringCGLIB$$24f9ad03 uses an unsupported class file version (pre Java 4}) and can't be instrumented. Consider updating to a newer version of that library.
  • the circuit breaker always writes out that it "is 12th consecutive measurement that is below the configured relief threshold"
    Latest system CPU load value measured is 0.07498741821841973. This is 12th consecutive measurement that is below the configured relief threshold - 0.8

Kibana version: 7.12.1

Elasticsearch version: 7.12.1

APM Server version: 7.12.1

APM Agent language and version: Java 1.24.0

Browser version: Chrome 91

Original install method (e.g. download page, yum, deb, from source, etc.) and version: download page

Fresh install or upgraded from other version? upgraded

**Is there anything special in your setup?
A special case might be that we have both Log4j and Logback within the same tomcat container.
Here is our agent config:

active=true
instrument=true
environment=quality
transaction_sample_rate=1.0
transaction_max_spans=1000
disable_instrumentations=
disable_metrics=
metrics_interval=30s
capture_body=off
capture_headers=true
log_ecs_reformatting_dir=/home/tomcat/tomcat-1.3/tomcat-axis2/logs/elastic
log_ecs_reformatting=SHADE
log_file_size=50MB
log_level=INFO
log_file=_AGENT_HOME_/elastic-apm.log
enable_log_correlation=true
secret_token=[redacted]
server_urls=https://[redacted]:8200
verify_server_cert=true
flush_interval=1s
max_queue_size=1024
application_packages=[redacted]
stack_trace_limit=50
trace_methods=
span_frames_min_duration=5ms
trace_methods_duration_threshold=5ms
use_path_as_transaction_name=true
global_labels=team=[redacted]
circuit_breaker_enabled=true
capture_jmx_metrics=object_name[Catalina:type=ThreadPool,name=*] attribute[connectionCount] attribute[currentThreadCount] attribute[currentThreadsBusy] attribute[maxConnections] attribute[maxThreads] attribute[poolSize], object_name[Catalina:type=Executor,name=*] attribute[activeCount] attribute[maxThreads] attribute[poolSize], object_name[Catalina:type=GlobalRequestProcessor,name=*] attribute[bytesSent] attribute[bytesReceived] attribute[errorCount] attribute[maxTime] attribute[requestCount] object_name[Catalina:type=Manager,context=*,host=*] attribute[activeSessions] attribute[sessionCounter] attribute[expiredSessions]

Errors in browser console (if relevant): n/a

Provide logs and/or server output (if relevant):

2021-06-21 13:10:36,582 [main] INFO  co.elastic.apm.agent.util.JmxUtils - Found JVM-specific OperatingSystemMXBean interface: com.sun.management.OperatingSystemMXBean
2021-06-21 13:10:36,584 [main] WARN  co.elastic.apm.agent.metrics.builtin.CGroupMetrics - No /proc/self/cgroup file line matched the tested patterns. Cgroup metrics will not be reported.
2021-06-21 13:10:36,630 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - Starting Elastic APM 1.24.0 as tomcat-application on Java 11.0.4 Runtime version: 11.0.4+11-LTS VM version: 11.0.4+11-LTS (Amazon.com Inc.) Linux 2.6.32-754.39.1.el6.x86_64
2021-06-21 13:10:36,631 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - environment: 'quality' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,631 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - transaction_sample_rate: '1' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,631 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - transaction_max_spans: '1000' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,631 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - disable_instrumentations: '' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,632 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - capture_body: 'OFF' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,632 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - global_labels: 'team= [redacted]' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,632 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - trace_methods_duration_threshold: '5ms' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,632 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - secret_token: 'XXXX' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,632 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - server_urls: 'https://[redacted]:8200' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,632 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - max_queue_size: '1024' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,633 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - application_packages: '[redacted]' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,633 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - log_file: '_AGENT_HOME_/elastic-apm.log' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,633 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - enable_log_correlation: 'true' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,633 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - log_ecs_reformatting: 'SHADE' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,633 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - log_ecs_reformatting_dir: '/home/tomcat/tomcat-1.3/tomcat-axis2/logs/elastic' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,633 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - log_file_size: '50mb' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,633 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - use_path_as_transaction_name: 'true' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,634 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - circuit_breaker_enabled: 'true' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:36,634 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - capture_jmx_metrics: 'object_name[Catalina:type=ThreadPool,name=*] attribute[connectionCount] attribute[currentThreadCount] attribute[currentThreadsBusy] attribute[maxConnections] attribute[maxThreads] attribute[poolSize], object_name[Catalina:type=Executor,name=*] attribute[activeCount] attribute[maxThreads] attribute[poolSize], object_name[Catalina:type=GlobalRequestProcessor,name=*] attribute[bytesSent] attribute[bytesReceived] attribute[errorCount] attribute[maxTime] attribute[requestCount] attribute[activeSessions] attribute[sessionCounter] attribute[expiredSessions]' (source: /home/tomcat/apm-agent/elasticapm.properties)
2021-06-21 13:10:37,935 [main] INFO  co.elastic.apm.agent.impl.ElasticApmTracer - Tracer switched to RUNNING state
2021-06-21 13:10:38,587 [elastic-apm-server-healthcheck] INFO  co.elastic.apm.agent.report.ApmServerHealthChecker - Elastic APM server is available: {  "build_date": "2021-04-20T19:55:39Z",  "build_sha": "32f34ed4298d648bf9476790f2a8a54d72805bb6",  "version": "7.12.1"}
2021-06-21 13:10:38,837 [elastic-apm-remote-config-poller] INFO  co.elastic.apm.agent.configuration.ApmServerConfigurationSource - Received new configuration from APM Server: {}
2021-06-21 13:10:52,041 [localhost-startStop-1] INFO  co.elastic.apm.agent.servlet.ServletVersionInstrumentation - Servlet container info = tomcat
2021-06-21 13:11:31,747 [localhost-startStop-4] WARN  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener -[redacted]$$EnhancerBySpringCGLIB$$24f9ad03 uses an unsupported class file version (pre Java 4}) and can't be instrumented. Consider updating to a newer version of that library.
2021-06-21 13:12:42,799 [elastic-apm-circuit-breaker] INFO  co.elastic.apm.agent.impl.circuitbreaker.SystemCpuStressMonitor - Latest system CPU load value measured is 0.07498741821841973. This is 12th consecutive measurement that is below the configured relief threshold - 0.8
2021-06-21 13:14:07,804 [elastic-apm-circuit-breaker] INFO  co.elastic.apm.agent.impl.circuitbreaker.SystemCpuStressMonitor - Latest system CPU load value measured is 0.6666666666666666. This is 12th consecutive measurement that is below the configured relief threshold - 0.8
2021-06-21 13:16:37,814 [elastic-apm-circuit-breaker] INFO  co.elastic.apm.agent.impl.circuitbreaker.SystemCpuStressMonitor - Latest system CPU load value measured is 0.14285714285714285. This is 12th consecutive measurement that is below the configured relief threshold - 0.8
2021-06-21 13:20:37,830 [elastic-apm-circuit-breaker] INFO  co.elastic.apm.agent.impl.circuitbreaker.SystemCpuStressMonitor - Latest system CPU load value measured is 0.08333333333333333. This is 12th consecutive measurement that is below the configured relief threshold - 0.8

Best regards
Wolfram

Hey Wolfram :wave:

Thanks a lot for trying out and reporting.
There are several issues to look into here, it may take a bit until I get to everything, but I will definitely try.

  • log_ecs_reformatting - which Logback version are you using? Maybe it has something to do with how we set flushing in the shade appender. I may add some debug logging to see what's up there. In the mean time, try to set log_level to debug for startup and a bit after and see if we instrument ch.qos.logback.core.OutputStreamAppender as we should.
  • This circuit breaker INFO logging is definitely useless. It should be printed only when there is a state change made by the circuit breaker.
  • The No /proc/self/cgroup file line matched the tested patterns. Cgroup metrics will not be reported warning - are you running your apps within containers? Or otherwise controlled groups? Can you provide an example of the contents of the /proc/self/cgroup file?

Hello Eyal,

Wow, that was quick!

  1. We are using logback 1.2.3. I just restarted with debug logging enabled and found the following log entries:
2021-06-21 14:32:29,938 [localhost-startStop-1] DEBUG co.elastic.apm.agent.bci.ElasticApmAgent - Type match for instrumentation ShadingInstrumentation: (name(equals(ch.qos.logback.core.OutputStreamAppender)) and not(isInterface())) matches class ch.qos.logback.core.OutputStreamAppender
2021-06-21 14:32:29,938 [localhost-startStop-1] DEBUG co.elastic.apm.agent.bci.ElasticApmAgent - Type match for instrumentation StopAppenderInstrumentation: (name(equals(ch.qos.logback.core.OutputStreamAppender)) and not(isInterface())) matches class ch.qos.logback.core.OutputStreamAppender
2021-06-21 14:32:29,945 [localhost-startStop-1] DEBUG co.elastic.apm.agent.bci.ElasticApmAgent - Method match for instrumentation ShadingInstrumentation: (name(equals(append)) and hasParameter(hasTypes(with(0 matches is(E)))) and not(isAbstract())) matches protected void ch.qos.logback.core.OutputStreamAppender.append(java.lang.Object)
2021-06-21 14:32:29,946 [localhost-startStop-1] DEBUG co.elastic.apm.agent.bci.ElasticApmAgent - Method match for instrumentation StopAppenderInstrumentation: (name(equals(stop)) and hasParameter(ofSize(0)) and not(isAbstract())) matches public void ch.qos.logback.core.OutputStreamAppender.stop()

Is this what you were looking for?
2. We are not running in a container. We are running on a physical server with Red Hat Enterprise Linux Server release 6.10 - the /proc/self/cgroup file exists but is empty.

Best regards
Wolfram

Logback 1.2.3 is actually what we test with, so I assume next step would be to add some debug logging to this mechanism, but that won't be as quick :slight_smile:

Is this what you were looking for?

Yes, it looks we do instrument what we should, so I'll need to take a deeper look. Could be the flush setting.

As for the cgroup file parsing - I'll make sure we don't log a warning in such cases.

I'll let you know once there is something to test.
Thanks for the feedback!

1 Like

Hello Eyal,

I think I found the root cause for my missing Logback logs but I am not sure what the reason behind the code is - maybe this makes sense to you.

I tried to debug the apm agent today and came to the following observation: In AbstractEcsReformattingHelper.onAppendExit, the mappedAppender is a NULL_APPENDER.

So I debugged a bit further where the NULL_APPENDER comes from and I think I was able to trace it to AbstractEcsReformattingHelper.shouldApplyEcsReformatting: The WildcardMatcher.anyMatch returns null as the matchers list contains [*PatternLayout*, org.apache.log4j.SimpleLayout, ch.qos.logback.core.encoder.EchoEncoder] but my Logback formatter class is ch.qos.logback.core.encoder.LayoutWrappingEncoder.

Here is an example what one of our appenders looks like - the others look similar:

<configuration debug="true" scan="true">
	<!-- Properties -->
	<!-- Path to logfile -->
	<property scope="context" name="log.folder" value="${catalina.base}/logs" />
	<!-- Pattern for logfile names -->
	<property scope="context" name="log.rollingPolicy.pattern" value="%d{yyyy-MM-dd}.%i" />
	<!-- Max size of a single logfile -->
	<property scope="context" name="log.rollingPolicy.timeBasedFileNamingAndTriggeringPolicy.maxFileSize" value="50MB" />
	<!-- Max age of a logfile in days -->
	<property scope="context" name="log.rollingPolicy.maxHistory" value="7" />
	<!-- Max total size of a group of logfiles -->
	<property scope="context" name="log.rollingPolicy.totalSizeCap" value="1GB" />
	<!-- Header part of a log entry (loggers can use log.encoder.pattern or build the pattern themself to add attributes) -->
	<property scope="context" name="log.encoder.pattern.head" value="%d{dd.MM.yyyy HH:mm:ss.SSS} %p [%t] - %c" />
	<!-- Attribute part of a log entry (loggers can use log.encoder.pattern or build the pattern themself to add attributes) -->
	<property scope="context" name="log.encoder.pattern.attributes" value="%replace(%X{CORRELATION_ID}){'(.+)', 'CORRELATION_ID=$1,'}%replace(%marker){'(.+)', 'MARKER=$1,'}" />
	<!-- Body part of a log entry (loggers can use log.encoder.pattern or build the pattern themself to add attributes) -->
	<property scope="context" name="log.encoder.pattern.message" value="%m | %throwable%n" />
	<!-- default pattern of a log entry based on the above parts -->
	<property scope="context" name="log.encoder.pattern" value="${log.encoder.pattern.head} | ${log.encoder.pattern.attributes} | ${log.encoder.pattern.message}" />
	<!--  -->
	<property scope="context" name="log.encoder.fileHeader" value="#dd.MM.yyyy HH:mm:ss.SSS LEVEL [THREAD] - LOGGER | attr1=value1,attr2=value2  | MESSAGE | THROWABLE" />
	<appender name="appenderDev" class="ch.qos.logback.core.rolling.RollingFileAppender">
		<file>${log.folder}/app.log</file>
		<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
			<fileNamePattern>${log.folder}/app.${log.rollingPolicy.pattern}.log</fileNamePattern>
			<maxFileSize>${log.rollingPolicy.timeBasedFileNamingAndTriggeringPolicy.maxFileSize}</maxFileSize>
			<maxHistory>${log.rollingPolicy.maxHistory}</maxHistory>
			<totalSizeCap>${log.rollingPolicy.totalSizeCap}</totalSizeCap>
		</rollingPolicy>
		<encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
			<charset>UTF-8</charset>
			  <layout class="ch.qos.logback.classic.PatternLayout">
				<pattern>${log.encoder.pattern.head} | ${log.encoder.pattern.attributes}%replace(%X{CONTEXT}){'(.+)', 'CONTEXT=$1,'}%replace(%X{WHO}){'(.+)', 'WHO=$1,'}%replace(%X{WHAT}){'(.+)', 'WHAT=$1,'}%replace(%X{TARGET}){'(.+)', 'TARGET=$1,'} | ${log.encoder.pattern.message}</pattern>
				<fileHeader>${log.encoder.fileHeader}</fileHeader>
			  </layout>
		</encoder>
	</appender>
</configuration>

Does this help?

Best regards
Wolfram

1 Like

Ahh, that makes sense!! With you, all we need to do is wait a bit for you to tell us what's wrong :smiley:
I didn't include this encoder because it is only a wrapper and assuming we are only interested in its relevant subclasses that contain the PatternLayout in their class name, which is obviously a bug in this case.
I will shortly create a fix, but in the mean time you can check whether extending the log_ecs_formatter_allow_list config works around this?

If you get a chance, please test the fix using this snapshot build, with the original configuration.
Thanks :pray:

I can confirm that both works: the log_ecs_formatter_allow_list setting and your snapshot build

2 Likes

Is there anything special in your setup?