Just thinking aloud, have not tried to implement anything, but I
probably will soon...
I am currently using span queries for the bulk of my queries.
Unfortunately, span queries only support term queries, which mean no
analysis will happen on the query terms. My current approach utilizes
a Lucene analyzer to analyze the terms used by the SpamTermQueries.
Using both a custom Lucene analyzer and an ElasticSearch analyzer (via
elasticsearch.yml) has numerous issues: need to support two systems,
potential mismatch, duplication of efforts, etc... The analysis API is
useful, but the network hop to analyze each term would be too high.
My current thinking is to create a local embedded ElasticSearch
instance who sole purpose would be fulfill analyze requests. The
existing TransportClient would continue to communicate with the actual
cluster, while this new NodeClient would only exist within the JVM.
The embedded node would use the same analyzer definitions found in the
cluster elasticsearch.yml file ("include" config files would be
I have never created an embedded ElasticSearch server, but I assume
I can construct one that does not interfere with the existing cluster
and/or the other middle boxes in the network. Correct?
Performance-wise. I expect that the performance of using the
analyze API locally would be identical to using a Lucene analyzer.
How heavyweight is an embedded ElasticSearch instance?