How to properly detect and handle transport errors in Elasticsearch logging (latest versions)

Hi everyone! :waving_hand:

I'm new to Elasticsearch and still learning, so I’d really appreciate some guidance.

I’m trying to understand the best way to detect and handle transport errors when sending logs to Elasticsearch (using the latest versions). My current issue is that when my app sends logs to ELK and Elasticsearch is down, the write attempts cause my microservice to crash.

I’d like to implement a detection or fallback mechanism so that, if an error occurs during log transport, the logs are sent instead to the OpenShift console, where I already have Filebeat configured to enqueue them.

For context, I’m doing all of this from a pivot microservice using Winston.
What would you recommend as the most reliable and modern approach to handle this scenario?

Thanks a lot for your help and patience as I learn!

Hi @Guido_hernan_Gagliar,
How are you sending the logs to Elasticsearch using Winston? Are using the ECS approach with Winston and Filebeat as documented here?
If you could share the code that would be amazing. Also, you could send the logs to both OpenShift and Elasticsearch using either Logstash or an OTel collector for redundancy. Alternatively you can use the Red Hat OpenShift Logging Operator as discussed here.

import { createLogger, transports, config, LoggerOptions } from 'winston';
import { indexPrefix, indexSuffixPattern, esTransformer, client } from './esConfig';
import  { ElasticsearchTransport, ElasticsearchTransportOptions } from 'winston-elasticsearch';
import { esFormat, httpConsoleFormat, debuggerFormat } from './formats';
import { environment } from './environment';

const inTesting = process.env.NODE_ENV === 'test';

const ElasticSearch: { [key: string]: ElasticsearchTransportOptions } = {
    logger: {
        level: 'info',
        client,
        indexPrefix: indexPrefix.logger,
        indexSuffixPattern,
        format: esFormat,
        transformer: esTransformer.logger,
        // ensureIndexTemplate: false,
        silent: environment.SILENT_ELK_LOGS,
    },
    debugger: {
        level: 'debug',
        client,
        indexPrefix: indexPrefix.debugger,
        indexSuffixPattern,
        format: esFormat,
        transformer: esTransformer.debugger,
        // ensureIndexTemplate: false,
        silent: environment.SILENT_ELK_LOGS_DEBUG,
    },
};
const Console: { [key: string]: LoggerOptions } = {
    logger: {
        level: 'info',
        format: httpConsoleFormat,
        silent: environment.SILENT_STDOUT_LOGS,
    },
    debugger: {
        level: 'debug',
        format: debuggerFormat,
        silent: environment.SILENT_STDOUT_LOGS_DEBUG,
    },
};

const loggerConsoleTransport = new transports.Console(Console.logger);
const debuggerTransport = new transports.Console(Console.debugger);
const loggerESTransport = new ElasticsearchTransport(ElasticSearch.logger);
const debuggerESTransport = new ElasticsearchTransport(ElasticSearch.debugger);

const loggerHttp = createLogger({
    exitOnError: false,
    handleExceptions: true,
    transports: [loggerESTransport, loggerConsoleTransport],
    levels: config.npm.levels,
    silent: inTesting,
});

const logger = createLogger({
    transports: [debuggerTransport, debuggerESTransport],
    levels: config.npm.levels,
    silent: inTesting,
});

loggerHttp.on('error', (error) => {
    // eslint-disable-next-line no-console
    console.error('loggerHttp error caught', error);
});
loggerESTransport.on('warning', (error: any) => {
    // eslint-disable-next-line no-console
    console.error('Elastic Search Transport Error caught', error);
});
debuggerESTransport.on('warning', (error: any) => {
    // eslint-disable-next-line no-console
    console.error('Elastic Search Transport Error caught', error);
});
export { loggerHttp, logger, loggerESTransport, debuggerESTransport };

Yes, this is the code I am currently using for transport. Obviously, I inherited all of this; it was implemented several years ago.
I have now updated everything to the latest versions, and I will follow the documentation you recommended to perform transport with Winston.
My idea is not to change the current operation, as there are many systems connected, and I don't want to disrupt the functioning of this entire environment.
Thank you very much for your help.