I have documents with fields themselves being a regular expression.
For example
doc1: regexp:1111.*01011
doc2: regexp: 111.*01011
So if I give a query with regexp:1111010111101011 should return doc1 and doc2, while a query with regexp:111011101011 should return only doc2. Is this type of query possible with Elastic? If not any alternate way of using Elastic in achieving this?
Yes, this is possible! You can use the percolator for that (one of my favorite Elasticsearch features!). The percolator allows you to index queries, and then later ask Elasticsearch if a given document matches those indexed queries.
To use the percolator, first you need to define a field of type percolator in the index' mapping. Here I'm defining a field my_query of that type, as well as a field my_expression that you can match the regular expressions against:
Andon, Thanks for the reply. I think this should solve my use case. So what is the performance if we have say a million documents each having a percolator field?
Good question. The Percolator doesn't quite scale the same as the other queries in Elasticsearch. The response time will basically be linear with the amount of stored percolator queries (although there are some optimizations, as detailed in the documentation).
The Percolator is one of the few examples of when it may be better to have more shards. That's because each of these shards will hold a subset of the stored percolator queries. If you have multiple shards then those will be able to percolate a document in parallel.
You probably need to do some testing with different numbers of documents and shards to see what an optimum for your cluster would be.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.