In general, if you use 8.x newer releases.
Us the proper datastream naming convention
logs-<datastream.dataset>-<datastream.namespace>
Many of the issue above have a better outcome.
Ah, I think this is related to the idea of placing documents into distinct indices to avoid conflicts. Since we don't have information on the incoming documents to identify schema (unless we look at actual fields used) I don't think we could make use of the dataset
.
I can not comment on the time frame, but I understand real design work is going on behind the scenes to solve the mapper parsing rejection. I can not comment more at this time, but I believe it is being actively worked on and will be a significant plus when released.
Well, that's exciting!
Part of why I encourage my company to stick with Elastic as a platform is the proven history of feature improvements and the expectation that various pain points we have will keep getting ironed out. Field conflicts have been a thorn in our side for a while, and it's causing grumblings about the choice of data store.
I have no idea how common our scenario of logs + mixed schema + conflicts is among Elastic users, but thinking about what led us here I have to imagine there must be a sizable contingent of users in the same boat, and so Elastic must care about our use case and be interested in making it work better. It would have been good to get a sense of how much Elastic cares here, especially as internal detractors have been growing more vocal. Maybe a product manager could chime in.
But it does indeed sound by what you're saying that this particular issue of conflicts is getting more love. I'm absolutely delighted to hear it.
Meanwhile, we're implementing now (via Logstash DLQ) a subset of what I expect Elastic is working on.
Our Solution, Phase 1
As a first pass we'll simply be taking any docs with conflicts and stuffing their entire JSON source into message
. This keeps docs from getting dropped, and the contents remain searchable. We lose structure, of course, so we can't point at fields specifically, or use them according to their types (numerics, keywords).
An alternative here is to convert the entire doc to a flattened type and shove it in, say, _document_flattened
, which should make the individual fields available for searching or querying distinctly, which is nice, and preserves an array of operations. However, our customers wouldn't be able to use those fields in their original locations and wouldn't be able to use all query types / aggregations.
Our Solution, Phase 2
This will be a more "articulated" form of reprocessing where we preserve all the original structure except the conflicts, and move the conflicts to either new fields (a conflicting {"status":"OK"}
might be shunted to status_text
eg) or append them to a common dumping field for conflicts.
I would love to put in a feature request for automatic handling of conflicts via index-level config (settings.index.mapping.conflicts.rename
?) ... But only if something like this isn't already in the works.