Schema - Field Conflicts - Index Design Considerations - Log Aggregation

Hi, We have around 30 Applications - each averaging 80 microservices.

I have been asked to review the Log Aggregation Solution which is.

Apps (microservices) -> Apigee -> ELK ( 6.8.0)

All data from Apigee (Event { Request, Response, KeyValueData} is sent to Logstash which then pushes all the data into One Index.

The problem I am having is some applications/services are pushing data to fields with different data types e.g.
HttpStatusCode: 200 vs HttpStatisCode:OK and many other fields which we use for aggregation, sums etc. So some apps post strings for a field whilst others post integers e.g. PaypalStatus

Since the logging is coming from Apigee, we cannot control how application send logs. Since the application are not aware of logging, since all traffic goes to Apigee.

The problem is Kibana reports/discovery will not work for all fields.

  1. We have Dynamic Mapping turned on.
  2. +- 45 Fields in conflict

Due to this, we sometimes cann t filter data on a field, since it has issues.

What is best practice around this?

  • Should I create an index per Application scope ( A collection of microservices that use similar code) - This will reduce conflicts per index? - It will also equate to 20 indexes ( 20 logical groups).
  • If we go with the above - All the dashboards and visualizations then need to be redone as they are coupled to one index alias. So reuse of visualisations for dashboards goes out the window. This means each team would need to design their own dashboards.

Is the current solution a good pattern or anti-pattern (There is no filebeat on the app servers etc, its all done via an api gateway - apigee).

Any advice on this is greatly appreciated.

Thx

Hello,

This is a known problem, multiple applications are sending data to a same field but with a different datatype. In order to avoid such a conflict we created something called Elastic Common Schema (ECS). It avoids such conflict to happen by defining a template for your applications. Logstash does not ship with ECS so you would need to define it yourself. You can have a look to the specifications of ECS right here: https://www.elastic.co/guide/en/ecs/current/ecs-reference.html

1 Like

Hi,

Thank you, I am aware of ECS, and it is a great intiaitive. In fact I found a bug with Microsoft Analytics Streaming Engine that does not support serializing json with same field names that differ by case, which ECS recommends for migrations.

So with this common problem. What recommendations do you have regarding 100's of apps. Do we enforce schema type safetgy with a custom pip, nuget, npm logging package or do some sort of transformation on the edges.

If I was designing this from scratch, I think apigee would not be the right place to push from as it cannot restrict schema it just deals with raw http. I think a better solution is getting apps to log to a correct schema, but this is not always practical when you have 20 to 30 development teams. It is definitely a challenge in a distrubted microservices environment.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.