Hi there.
I'm very new to Elasticsearch, so please forgive me if this is a question that has been answered elsewhere.
In our application, we find it important to be able to look up the logs by a specific user ID. However, the user involved might only be involved much later on in a request (each request is assigned a unique correlation ID) - and so there would be a number of log messages already recorded that wouldn't have the user ID associated with them.
What would be the optimal way to structure an Elasticsearch document that allows for easy searching of log records by a user ID? My proposed document structure is provided below:
{
"correlation_id": "111",
"users": ["000", "222", "333"],
"records": [
{ "message": "something happened", "logged_at": "2018-01-01T00:00:00Z" },
{ "message": "some other event", "logged_at": "2018-01-01T00:00:00Z" },
{ "message": "user identified:222", "logged_at": "2018-01-01T00:00:00Z" },
{ "message": "some other event", "logged_at": "2018-01-01T00:00:00Z" }
]
}
So, when searching for log records, I want to be able to find all the records for a correlation_id
when searching for any log records that involve user 222.
However, in order to create a structure like this, I'll need to use upserts from within Logstash. My concern is that the appending of records to the records
property might start to cause hot spots - is that even a concern with Elasticsearch?
If appending to the same array is not the correct way to go about it, what would the recommendation be in structuring and linking multiple log documents together? (ie: ensuring that searching for a specific user will bring back all the documents that have the same correlation_id
).
Apologies for the wall of text. As mentioned, I've very new to Elasticsearch, and would really appreciate any input or guidance that can be given!