I'm indexing some documents that occasionally have really large tokens (greater than 300,000 characters). I don't care about searching against these large tokens, but my use of a pattern-replace char filter makes them take more than 30 minutes to be indexed.
I've tried adding a length token filter, but it looks like that gets applied after the char_filter. Is there any way to prevent the char_filter from analyzing these super long tokens?