Logstash + Elasticsearch Architecture Tips

Hey, very new to LS and ES, and ELK in general so while I am setting up my system I have a few questions regarding the advised architecture and looking for recommendations and tips for best implementation/use.

The plan is to use log4net and dispatch messages to LS and ES. The logs come from serving multiple clients by a single server insance. The plan is to keep logs for 12 - 36 months and the size of them is around 500GB.

First question is about indexes, what is recommended in terms of index architecture in the above scenario:

  • A single index client-logs ?
  • A daily client-logs-%{+YYYY.MM.dd} ?
  • Per client log client-%{clientId}-logs ?
  • Per client daily log client-%{clientId}-logs-%{+YYYY.MM.dd} ?

Coming from SQL background it feels more natural to have it more clastered than a single giant client-logs index. However, searching across multiple indices and the same time range might be an issues. Any tips will be welcomed.

Second question is about logstash and its config. The default setup is that it listens on a UPD port and ships messages out to ES. Client and Server logs have different layout patterns. All clients share the same layout pattern so it makes perfect sense to just configure a pattern and a UDP appender and dispatch all to the same UPD port. Since Server logs have a different layout the question is:

  • Should LS listen for messages on a single UPD ports for client and server messages?
  • Should LS listen for client messages on one port and for server messages on another?

Assuming a single UDP port setup, the input and output filter can be done on logger name. In terms of performance which set up is better?

From what you wrote I would choose client-%{clientId}-logs-%{+YYYY.MM.dd}.

Also depending on network load I would avoid UDP since you can lose data since it's less reliable. I would choose TCP as the protocol.

Not sure if you are using log4net just to generate the logs or to send the data also. It might be worth looking into Beats on the servers to read the log4net logs and send the data to LS.

Use time-based indices, possibly divided by client, but make sure shards are not too small. Given that you have a reasonably long retention period you may want to consider monthly indices rather than daily though for that volume of data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.