Performance of doc_values field vs analysed field

OK, I've discovered that it should be possible to write a plugin that contains a custom tokenizer.
It's not a process that's documented anywhere, and I'm hitting problems: Building a custom tokenizer: "Could not find suitable constructor"