Our team is developing a product - an application for e-commerce.
This application has logs that are stored in the file system and displayed in the admin panel of this application. So any admin user can go to the admin panel and view the logs of interest to him.
At the moment, we already have a couple of hundred instances of our application up and running.
Right now we would like to change the storage of our logs, from the file system to some other place, and extend the admin view of the logs with search and filtering.
So the end result we are going to achieve is: save a large number of logs (logs of all instances of our application) in a new repository that will help us efficiently search and filter logs and display the result in the admin panel of our application.
As one of the possible options, we consider Elastic Observability.
During our implementation attempts, we were able to use the OpenTelemetry Collector to send our logs to the Elastic APM server and then we were able to see them under Observability -> Logs -> Stream (as described in the documentation).
But unfortunately, we still have a part of the unfulfilled goal - to return these logs back to our application with the ability to search and filter, as allowed by the Elasticsearch REST API.
Is there any way to get the saved logs in Elastic Observability back to our application with the ability to search and filter logs? Or maybe in our case it would be better to consider other Elastic solutions (for example, use only Elasticsearch with its REST API)?
The private HTTP API used by the Observability Logs UI can not offer the stability guarantees that would be appropriate for consumption by your application.
Your idea of using the Elasticsearch REST API directly sounds plausible. But there might be other options depending on your deployment. Is the application you want to offer the filtering capabilities in a web app itself? Could it link to Kibana or embed it in an iframe?
Yes, the admin panel is part of our application. And in general, this is a web application.
Unfortunately no. We are not considering inserting a link or iframe into our application. Inserting a link or iframe will change the user experience and we would like to avoid that.
If you want to retain full control over the UI instead of embedding Kibana then querying the log indices in Elasticsearch sounds like the way to go. While I wouldn't recommend to use the private Logs UI API, you can look at how it constructs its queries under the hood.
A few thoughts to help read that code:
Kibana applies a heuristic to reconstruct certain known-incompatible documents, which you probably don't need. So you could ignore everything about messageFormattingRules.
Kibana queries the logs in chunks using the search_after query parameter. In combination with _doc as the secondary sorting criterion this allows the UI to filter for arbitrary chunks of log entries even if the timestamp is identical.
We use the _async_search feature of Elasticsearch to support queries that might take longer than the normal request timeout. This might happen when querying documents that live in indices that are not in the hot data tier anymore.
Hope that helps somewhat. I'd be happy to answer questions about the queries that might arise.
So, if I understand you correctly, in order to achieve our goal, we should do something like this: instead of sending data to the APM server, we should index the logs directly into the custom Elasticsearch index through REST API, and then search and filter the logs through the same API?
Please correct me if I am wrong as we are just getting started with Elastic solutions
No worries, the flexibility unfortunately begets some complexity too You can continue to send your logs via the APM server, via a separate shipper like the Elastic Agent or directly via the Elasticsearch HTTP API. If shipping them via APM works for you, then that's a great start.
Regardless of the ingestion pathway, the log entries will end up in an Elasticsearch datastream (a concept that builds on indices) that you can query via the Elasticsearch search APIs. The name of this datastream will probably start with logs-apm if ingested via the APM server. If you look at the detail fly-out for a log entry in Kibana's Logs UI you'll see the index name mentioned at the top of the fly-out. You can then use an index name pattern like logs-apm* in any custom query via the Elasticsearch REST API to retrieve the log entries. Does that make sense?
As I mentioned earlier, currently we have several hundred instances of our application up and running (one instance per customer), and as far as I understand, if we send our logs through APM, then all logs will be written to one data stream, which will be split into several indexes.
Would it be performance efficient to keep the logs of all application instances in the same data stream and then make queries with the required application_instance_id filter (to show the logs for a specific application instance) or maybe it would be better to create a separate index for each application instance using an index name pattern, such as logs-{application_instance_id}, and then make queries for a specific index?
You are correct in that a data stream will be backed by several indices that are rolled over by time. In general I'd recommend splitting the logs into separate data streams per app because it allows for easier optimization of the queries by Elasticsearch's query rewriter. It also makes data management easier because you can apply different permissions and retention policies to the logs of different applications. I'm currently researching if the APM server can be configured to send the logs to different data streams based on the app. I'll get back to you as soon as I have learned more.
Maybe it would be nice to manually create a data stream or index for each instance of our application and send data directly via Elasticsearch API, without the mediation of an APM server?
With this approach, we could describe our own log model and not use the one that is created by the APM server. This is a clear plus for us, because otherwise we will have to work with two models for logs:
the first model is the OpenTelemetry model which is created in our application and which we eventually send to the APM server.
the second model is the model that the APM server converts from the OpenTelemetry model and stores in the index, and then we read this model from the index and display it in our application.
It looks like the APM server was designed specifically for Elastic Observability and is a bit redundant in our case, please correct me if I'm wrong. What do you think about this? Maybe we missed something?
According to my research there is currently no way to configure the log index creation when shipping the log entries via the APM server OTEL integration.
So directly indexing the log entries (or via Elastic Agent or filebeat) sounds like a reasonable approach for you. It would definitely allow for more flexibility for your custom use-case. If you still want to maintain the option of being able to correlate the log entries with APM traces, you just have to make sure to include a few pieces of metadata as described in the log correlation APM docs.
Regardless of which ingestion pathway you choose there are a few recommendations that might be relevant:
When choosing the data stream names, adhere to the integrations index naming scheme. This is the result of a long processes of internal trial and error and achieves a good compromise between flexibility and predictability IMHO.
I hope this is somewhat helpful. Please let us know if you want to dig deeper into any specific topic.
Thanks for your support!
So far, we've been able to implement our own per-customer log data streams (following the guidelines you suggested).
In your last comment you mentioned that there is no way to send logs to different data streams via APM server OTEL integration, but perhaps there is a way to achieve this without OTEL integration? Maybe there is some kind of workarounds?
There is a very big chance that in the future we are going to implement traces and metrics for our clients, and it would be better for us to use some ready-made solution to these problems, rather than come up with our own solution. That's why we try to consider all possible solutions
The workaround for the log data stream limitation of the APM server would be to use one of the APM agents in your application (depending on which languages your apps are written in) to automatically enrich the log entries with the attributes required for correlation in Elasticsearch. The downside is that you'd have to run a separate shipper (such as Agent or Filebeat) to pick up these enriched log files to ship them to Elasticsearch. The advantage, on the other hand, would be that the standardization on ECS fields means that it should be easy to switch to a different log shipping methods later. I don't know if that qualifies as a "ready-made solution" for you.
I'll point our APM devs to this thread. Maybe they can offer more qualified advice.
There's currently no reasonable workaround for APM Server to split logs into multiple data streams. When you run APM Server with Elastic Agent and Fleet, APM Server is given only limited privileges and cannot write to arbitrary logs data streams. I wouldn't really recommend this, but you could run APM Server in "legacy", AKA standalone mode: https://www.elastic.co/guide/en/apm/guide/current/install-and-run.html. If you configure APM Server's Elasticsearch output with sufficient privileges (e.g. to write to any data stream), then you could modify the ingest pipelines to use a script ingest processor to modify the data stream by setting the _index metadata field: Script processor | Elasticsearch Guide [8.11] | Elastic.
Originally we intended to produce application-specific log data streams. For example, say you had logs for Kibana and Elasticsearch: we would index these into two data streams, like logs-apm.app.kibana-<namespace> and logs-apm.app.elasticsearch-<namespace>. This turned out to create too much load on Elasticsearch for users with hundreds or thousands of unique services, hence why we currently send everything to one data stream per type of data.
Based on this topic I would like to summarize the possible options for integrating the telemetry data of our application instances with Elastic, with the ability to have a separate data stream for logs and metrics for each application instance:
APM Server in "legacy", AKA standalone mode.
Implement our own telemetry data integration. More specifically, create our own index templates for data streams of logs and metrics, something like logs-my.app-* (taking into account the Elastic Common Schema) and directly index data into these data streams per instance of the application.
We have already implemented our own log streams for each application instance and integrated them with Kibana UI, I mean the Observability->Logs->Stream section. I think the same can be done with metrics and APM sections, am I right?
Use Elastic APM integration: set up, for example, one fleet server and many Elastic agents (with APM integration and a unique data_stream.namespace that will point to a specific application instance), one Elastic agent for each instance of our application.
Last but not least, wait for the possible implementation of the "Dynamic Data Stream Namespaces" feature. But I perfectly understand that this is a difficult one and far from a priority task for you and may not be realized at all
Perhaps there are some other options that I missed.
We would be very happy if you expressed your opinion about the options that I described and would advise us which option is most preferable to choose.
@michael_kot sorry for the delay in responding, I was on vacation and then I missed the notification among the many emails that accumulated
We have already implemented our own log streams for each application instance and integrated them with Kibana UI, I mean the Observability->Logs->Stream section. I think the same can be done with metrics and APM sections, am I right?
Yes, but again with the caveat that APM Server running under Fleet can only write to certain data streams. So to route to arbitrary streams, you'll need to run in legacy mode with increased privileges.
For an immediate solution, which option you go with really depends on your appetite for managing custom index pipelines or running multiple Elastic Agents. If you have just a handful of applications, then running multiple Elastic Agents with different namespaces may be the easiest option for now.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.