Hi we are indexing log events from an apache server using the ELK stack. Currently we have 3 types of documents
Request : ( ID, USER_NAME, URL, TIMESTAMP)
Exception : (ID, EXCEPTION_NAME, TIMSTAMP)
Response : ( ID, RESPONSE_CODE, RESP_BODY, TIMSTAMP)
The ID filed is unique per user per request and is the main key linking all the three document types. We would like to have aggregate queries like:
URL | ExceptionName | Number
/a/b | A | 1
/a/b | B | 34
/a/c | A | 45
User | Exception | Number
John | A | 34
John | B | 45
Smith | A | 1
jane | C | 23
Also we need to know the average response time which is avaialble by comparing timestamps associated with a request event and the corresponding response event.
In all these cases we need to be able to relate documents of different types something like a join in SQL. I figured one way to achieve the aggregate results will be to include data from the request event in the exception and response events while indexing them to ES. This seems inline with the NoSQL approach of keeping all related data together (denormalizing) . But i am having a hard time to achieve the same in logstash. More precisely how do i update an event with data from a previous event and then index it in to ES?
Scenario:
- Request event generated -> indexed in ES with ID A1234
- ....
- ....
- Exception event generated. Need to pull Request event from ES with id A1234 add some fields from that event to this exception event and then index it to ES. The final Exception event may look like:
{ ID : A1234, Exception_Name: XYZ, Request_URL: /a/b, User: John }
Is there a way to do it in ES? Or any other approaches? Or is ES not suited for this kind of usecases?
Our log event stream is moderately dense about 80-100 events/sec.