I've been given the task to rewrite an application that searches for audit data in an SQL Sever database. Currently the data is being queried using regular SQL queries, but I would like to go for a better approach, using Logstash and Elasticsearch and step away from the current setup, which is performing poorly.
The database is currently about 1TB in size, and the relevant data is stored in 5 tables (normalized but no foreign keys constraints). Every day, around 10000 records are added to these tables.
My main concern is the processing of audit events, stored in the Events table. The actual event payload is stored in a PAYLOAD varchar field, but the payload is actually an XML. The "problem" here is that there are around 150 types of payloads, each with a different XML format and
full text search must be possible on the payload. Mapping the different XML formats would be a big task, so my question is can I just store the payload field in the index and still enable full text search?
My idea is to denormalize the different tables, and store just one event document (containing data from the 5 relevant tables). Or should I create second index, just containing the XML payload data?
I don't have extensive elasticsearch experience, so maybe I'm missing something.