I am looking into storing probabilistic lattices/confusion networks rather than one-best collections of words. In a lattice/confusion network, each word position in the document is really a set of possible words with associated probabilities that sum to 100%. As far as I can tell, there is not low level support for lattice / confusion networks in Lucene - nor higher level support in Elasticsearch. Is that correct?
Another major consideration is that I need to preserve phrase matching. My somewhat hack-ish idea after looking through the documentation is to abuse the "position" values in the output of the analyzers. If I could force multiple tokens to share the same position, I would essentially create an index from a lattice while potentially preserving the phrase match functionality depending on monotonically increasing position values... is this reasonable? and if so, how can I accomplish this? A custom analyzer?