There isn't an option to control the operator based on a field name, but, you can create a specific mapping type (check the attachment plugin for an example) that is an md5 type. There, you can actually control the query that will be used when querying against that field in query parser case, and in general "term" queries. So, you can construct your boolean query there, and break it up yourself.
If you have a specific type, you won't have to have a special tokenizer, since you can construct the string to be indexed yourself, and then use something like whitespace analyzer on it (a bit simpler, at least an option).
Still, I would check if you really need this, and you are not optimizing before you really need to.
On Friday, April 29, 2011 at 8:27 PM, ofavre wrote:
I've written a HashSplitter Tokenizer (and TokenFilter), which splits any hash (or value) into tokens of same size, each with a prefix that indicates its position.
For instance, with a chunk length of 3, and prefixes set to "0123456789", the value "foobarbaz" becomes ["0foo","1bar","2baz"].
This works well, and I've gotten inspiration from NGramTokenizer and NGramTokenFilter for implementation, and from the ICU plugin to create a packaged plugin.
The configuration of the values for chunk_length and prefixes is done withing config/elasticsearch.yml:
index: analysis: analyzer: md5_hashsplitter: type: custom tokenizer: md5_hashsplitter_tokenizer tokenizer: md5_hashsplitter_tokenizer: type: hash_splitter chunk_length: 4 prefixes: ABCDEFGH
Indexation and querying now works transparently, which is great, but I have a bug:
As I have many collision for every chunk, I get many results for a query like q=md5:d41d8cd98f00b204e9800998ecf8427e ...
If I add &default_operator=AND then I get only one result, as expected.
My question is:
Is there a way to make the default_operator parameter automatic for a specific field? (I think no because this has to do with the querying, not the indexing, so no configuration seems possible.)
Or at least is it possible to make AND the default_operator for a specific field inside the query in a good fashion?
Or even better, is it possible to modify something in the code somewhere (in my TokenFilter or Tokenizer) to make sure is does a AND?
Maybe a TokenFilter/Tokenizer for the query (not the indexing), that adds a "+" before the output token...
Thank you for your help!
Enjoy your week end.
View this message in context: Re: Advices indexing MD5 or same kind of data
Sent from the ElasticSearch Users mailing list archive at Nabble.com.