Luiz, thanks for responding!
I had forgotten to mention I tried not_analyzed as well. The analyzer it
turns out wasn't my problem.
I had 2 problems. First, the ES/Lucene regexp query/filter doesn't support
"\d" for indicating digits. So I had to replace them with the [0-9]
character class. Once I changed my regex to: "
http://example.com/([0-9]{4})/([0-9]{2})/([0-9]{2})/[^/]+/" it worked!
My second problem is it appears the python library has a bug. When I try
the following python using elasticsearch-py:
query = {
"query": {
"regexp": {
"url": "http://example.com/([0-9]{4})/([0-9]{2})/([0-9]{2})/[
^/]+/"
}
}
}
es.search(index="regex-test",doc_type="test1", body=query)
I get:
{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5}, u'hits':
{u'hits': , u'max_score': None, u'total': 0}, u'timed_out': False,
u'took': 11}
However, when I do this query on the command line:
curl -XPOST "http://localhost:9200/regex-test/type1/_search" -d'
{
"query": {
"regexp": {
"url": "http://example.com/([0-9]{4})/([0-9]{2})/([0-9]{2})/[
^/]+/"
}
}
}'
{"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"regex-test","_type":"type1","_id":"doc1","_score":1.0,
"_source" : {"url":"http://example.com/2014/04/15/foo-bar-baz/"}
So I guess the issue lies with elasticsearch-py?
On Tue, Apr 15, 2014 at 5:59 PM, Luiz Guilherme Pais dos Santos <
luizgpsantos@gmail.com> wrote:
Hi Matt,
If you mark your field as not_analyzed:
{
"mappings": {
"type1": {
"properties": {
"url": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
You could use a regexp query:
POST _search
{
"query": {
"regexp": {
"url": "http://example.com/\d{4}/\d{2}/\d{2}/([^/]+)/$"
}
}
}
On Tue, Apr 15, 2014 at 5:57 PM, matt burton mcburton@gmail.com wrote:
I have a field in my documents that consists of a URL.
{...
"url":"http://example.com/2014/04/15/foo-bar-baz/"
...}
I would like to use a regexp query/filter to find documents in my index
with urls matching a regex pattern.
For example: "http://example.com/\d{4}/\d{2}/\d{2}/([^/]+)/$"
I'm a bit stumped about how to configure an analyzer in the document
_mapping to enable a regexp search (like above) for the url field. I've
tried the standard and keyword analyzer, but they didn't work.
I'm not even sure if this is possible to do, if not I'll can do it
outside of ES, but I thought I'd ask here to see if ya'll had any guidance.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/62e05ecc-500f-474e-a5e6-220a9eb86eb3%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/62e05ecc-500f-474e-a5e6-220a9eb86eb3%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
Luiz Guilherme P. Santos
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/4_Hz3ivP4uo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGwrZWON6tKoZDf4d0BOenDJDNyxaU0HfUOOV83%2Bh9KKA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGwrZWON6tKoZDf4d0BOenDJDNyxaU0HfUOOV83%2Bh9KKA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B0EHHrZ%2B%3DDqRk57fc9%3D26gVqALKqBjqd2BVz3%3D-8cgP26GEWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.