How assemblies are loaded is defined by the runtime host when it loads the runti…me into a process. In some scenarios where there can be multiple AppDomains in a process, such as multiple ASP.NET applications running in IIS and sharing an Application Pool, assemblies can be loaded as domain-neutral, to allow them to be shared across AppDomains, which typically happens for common assemblies loaded from the Global Assembly Cache (GAC). Domain-neutral assemblies are loaded into the App Domain named `EE Shared Assembly Repository` in CLR 4, which will then share these assemblies with other AppDomains in the process.
When a profiler auto-instrumentation targets an assembly that is loaded domain-neutral, the assembly containing the instrumentation, Elastic.Apm.Profiler.Managed, must also be loaded domain-neutral, since IL rewriting performed by the profiler inserts calls to methods contained within Elastic.Apm.Profiler.Managed. The general rule is- **if an assembly is loaded domain-neutral, all of its dependencies must be loaded domain-neutral**.
The runtime's loading decision can be influenced by implementing `ICorProfilerCallback6::GetAssemblyReferences`, to tell the runtime that an assembly reference will be added to the metadata of the assembly being loaded, at a later point in time. The runtime can then use this information to determine how to load the assembly. The Elastic APM profiler implements `ICorProfilerCallback6::GetAssemblyReferences` to add an assembly reference to Elastic.Apm.Profiler.Managed for every assembly it is called for, except those on skip lists. The desired outcome is for Elastic.Apm.Profiler.Managed to be loaded domain-neutral so that it can instrument assemblies that are loaded domain-neutral.
A problem arises in this approach in that it appears. Elastic.Apm.Profiler.Managed cannot be loaded as domain-neutral because of one of its dependencies, including transient dependencies. My suspicion is that it might be related to Elastic.Apm's use of HttpClient in net461, which is defined in netstandard.dll.
## Current investigation
### Setup
1. Run AspNetFullFrameworkSampleApp in local IIS, in the Default Application Pool.
2. Ensure AspNetFullFrameworkSampleApp is the only application running in the Default Application Pool.
3. Configure profiler auto instrumentation by setting the following environment variables for the Default Application Pool (only possible in IIS 10+):
```
COR_ENABLE_PROFILING="1"
COR_PROFILER_PATH="<path-to-repo>\target\debug\elastic_apm_profiler.dll" />
COR_PROFILER="{FA65FE15-F085-4681-9B20-95E04F6C03CC}"
ELASTIC_APM_PROFILER_HOME="<path-to-repo>\src\Elastic.Apm.Profiler.Managed\bin\Release"
ELASTIC_APM_PROFILER_INTEGRATIONS="<path-to-repo>\src\Elastic.Apm.Profiler.Managed\integrations.yml"
ELASTIC_APM_PROFILER_LOG_DIR="<path-to-repo>\logs"
ELASTIC_APM_PROFILER_LOG="trace"
ELASTIC_APM_PROFILER_LOG_IL="1"
```
4. Run the application
5. The default page opens successfully
6. Hit the `/Database` page to trigger the instrumentation of System.Data.SQLite
7. Observe a FileNotFoundException is thrown
```
Server Error in '/AspNetFullFrameworkSampleApp' Application.
Could not load file or assembly 'Elastic.Apm.Profiler.Managed, Version=1.11.0.0, Culture=neutral, PublicKeyToken=ae7400d2c189cf22' or one of its dependencies. The system cannot find the file specified.
```
### Evaluation
The log files captured in `<path-to-repo>\logs` provide details of what happens.
In Elastic.Apm.Profiler.Managed.Loader*.log, the Elastic.Apm.Profiler.Managed assembly fails to be loaded as domain neutral by the Elastic.Apm.Profiler.Managed.Loader assembly shim
```
[2021-09-07T16:04:49.0383054+10:00] [ERROR] Error loading managed assemblies.
System.IO.FileNotFoundException: Could not load file or assembly 'Elastic.Apm.Profiler.Managed, Version=1.11.0.0, Culture=neutral, PublicKeyToken=ae7400d2c189cf22' or one of its dependencies. The system cannot find the file specified.
File name: 'Elastic.Apm.Profiler.Managed, Version=1.11.0.0, Culture=neutral, PublicKeyToken=ae7400d2c189cf22'
at System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
at System.Reflection.RuntimeAssembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean forIntrospection)
at System.Reflection.RuntimeAssembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, Boolean forIntrospection)
at System.Reflection.Assembly.Load(String assemblyString)
at Elastic.Apm.Profiler.Managed.Loader.Startup.TryLoadManagedAssembly()
```
[`System.Reflection.RuntimeAssembly._nLoad` is a internal call](https://referencesource.microsoft.com/#mscorlib/system/reflection/assembly.cs,1911). I believe the FileNotFoundException might be misleading, and not the real underlying cause for the exception, because the elastic_apm_profiler*.log indicates that the Elastic.Apm.Profiler.Managed assembly does get loaded, but it is loaded into the local AppDomain, not the domain-neutral AppDomain. Because it is loaded into the local App Domain and not the domain-neutral AppDomain, it won't be able to instrument System.Data.SQLite without being loaded into the domain-neutral AppDomain. However, the grant permissions of Elastic.Apm.Profiler.Managed when loaded into the domain-neutral AppDomain would need to match those of it being loaded into the local AppDomain, which I believe is the underlying issue for the error log. Further evidence to support this is that if `Startup.TryLoadManagedAssembly` is changed to explicitly load the Elastic.Apm.Profiler.Managed assembly that we know exists on disk, a `FileNotFoundException` is still thrown when hitting the `/Database` endpoint. The fundamental question though is why Elastic.Apm.Profiler.Managed assembly is loaded into the local AppDomain to begin with, and not the domain-neutral AppDomain. My suspicion is that it might be related to Elastic.Apm's use of HttpClient in net461, which is defined in netstandard.dll. Investigation requires removing this dependency from net461, by using `HttpWebRequest` instead of `HttpClient`, etc.
## Intermediate Workaround
Assemblies can be _forced_ not to be loaded as domain-neutral, by using `LoaderOptimization.SingleDomain`. This can be achieved with
**Environment variable** (preferable option)
`COMPlus_LoaderOptimization=1`
or
**Registry settings** (_less preferable option_)
`HKEY_LOCAL_MACHINE\Software\Microsoft\.NETFramework` create DWORD with value `1`
`HKEY_LOCAL_MACHINE\Software\WOW6432Node\Microsoft\.NETFramework` create DWORD with value `1`
This solves this issue altogether.
This setting has no effect on mscorlib, which is always loaded domain-neutral, so if mscorlib is a target for instrumentation, this workaround will not work. Another downside to this workaround is that if there are many AppDomains running in a process, which can commonly be the case with multiple applications using the same Application Pool in IIS, assemblies will not be shared, meaning each AppDomain will consume more memory and resources.