Java Regex Search


(dean-2) #1

Does ElasticSearch support Java Regex search?
For example,there is a field value"2008-12-03 some information here ....".
I want use "\d{4}-\d{2}-\d{2}" to search it out.
Can I do this with ElasticSearch?

--


(Radu Gheorghe) #2

Hello,

I think I got lost in escaping so I couldn't do your exact expression
but something like this works:

curl -XPOST localhost:9200/testindex/testtype/_search?pretty=true -d '{
"filter" : {
"script" : {
"script" : "_source.fieldname ~= '"'2008.*'"'"
}
}
}'

However, that would be very slow compared to "normal" search
operations, because it has to go through all the documents. You might
not feel the difference with small amounts of data, but with bigger
datasets it will get inconvenient.

Instead, you might want to structure your documents so that your date
ends up in a separate field. If some document doesn't contain a date,
that's fine, don't specify it. In the end you can filter documents
that have a date using an "exists" filter:
http://www.elasticsearch.org/guide/reference/query-dsl/exists-filter.html

Or the other ones with a "missing" filter:
http://www.elasticsearch.org/guide/reference/query-dsl/missing-filter.html

That would be hugely faster than doing a regex. Plus, you can do all
the nice stuff like filter with date ranges, etc.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Thu, Oct 18, 2012 at 2:55 PM, dean elasticbetter@gmail.com wrote:

Does ElasticSearch support Java Regex search?
For example,there is a field value"2008-12-03 some information here ....".
I want use "\d{4}-\d{2}-\d{2}" to search it out.
Can I do this with ElasticSearch?

--

--


(phill) #3

On 10/18/2012 5:49 AM, Radu Gheorghe wrote:

However, that would be very slow compared to "normal" search
operations, because it has to go through all the documents. You might
not feel the difference with small amounts of data, but with bigger
datasets it will get inconvenient.
I definitely have to agree with the caveat mentioned.

Doing a RE on a field is to instantiate EVERY value of that field in the
index and try the RE on it.
Always pre-calculate and save whatever you can before search time!
Build something that anticipates the question.

As Radu suggested, do the RE, but do it AS YOU INDEX, put it in another
field. RE the "some information here"
put that in another field. Keep the body of the text if you need to
search that also.
Using various other fields may allow you to extract other useful
information or even multiple occurrences of the same date and text
pattern in your text.

The date(s) can even go in a date type field. Then you can do date range
operations on it. The alternative to a date field might be a long
integer representing a typical time stamp.

Once you have various fields with the values you'll be searching, you
can specify a query which involves the date, the "some information", and
even the rest of the text. Each will involve all the indexing that ES
and Lucene can provide instead of a string operation on free-form text
at the last second.

Good luck,

-Paul

--


(system) #4