Hello,
We are trying to search for urls within text in a field. For example
we have a field called content that has the value "use http://elasticsearch.org
for search and email kimchy@elasticsearch.org for questions."
We are using the following configuration for our elasticsearch.yml:
index:
analysis:
analyzer:
default_index:
filter: [standard, lowercase ]
tokenizer: uax_url_email
default_search:
tokenizer: uax_url_email
filter: [standard, lowercase]
With these settings we are able to search for kimchy@elasticsearch.org
and get a result but when we search for http://elasticsearch.org we
get no results.
We have used curl to verify the results of analyze and see that
http://elasticsearch.org and kimchy@elasticsearch.org each geneate a
single token:
curl http://localhost:9200/elastic_searchable/_analyze?text=http://elasticsearch.org
{"tokens":[{"token":"http://elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}
curl http://localhost:9200/elastic_searchable/_analyze?text=kimchy@elasticsearch.org
{"tokens":[{"token":"kimchy@elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}
We aren't sure why the url search doesn't work. Any ideas?
Thanks
On Thu 08 Mar 2012 03:12:22 CET, Ian Carvell wrote:
Hello,
We are trying to search for urls within text in a field. For example
we have a field called content that has the value "use http://elasticsearch.org
for search and email kimchy@elasticsearch.org for questions."
We are using the following configuration for our elasticsearch.yml:
index:
analysis:
analyzer:
default_index:
filter: [standard, lowercase ]
tokenizer: uax_url_email
default_search:
tokenizer: uax_url_email
filter: [standard, lowercase]
With these settings we are able to search for kimchy@elasticsearch.org
and get a result but when we search for http://elasticsearch.org we
get no results.
We have used curl to verify the results of analyze and see that
http://elasticsearch.org and kimchy@elasticsearch.org each geneate a
single token:
curl http://localhost:9200/elastic_searchable/_analyze?text=http://elasticsearch.org
{"tokens":[{"token":"http://elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}
curl http://localhost:9200/elastic_searchable/_analyze?text=kimchy@elasticsearch.org
{"tokens":[{"token":"kimchy@elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}
We aren't sure why the url search doesn't work. Any ideas?
Thanks
Probably something to do with how you are searching. I suggest gisting
a small recreation of the issue: Elasticsearch Platform — Find real-time answers at scale | Elastic
Here is a gist showing what we are doing: Elasticsearch url search · GitHub
On Mar 7, 11:51 pm, Clinton Gormley cl...@traveljury.com wrote:
On Thu 08 Mar 2012 03:12:22 CET, Ian Carvell wrote:
Hello,
We are trying to search for urls within text in a field. For example
we have a field called content that has the value "usehttp://elasticsearch.org
for search and email kim...@elasticsearch.org for questions."
We are using the following configuration for our elasticsearch.yml:
index:
analysis:
analyzer:
default_index:
filter: [standard, lowercase ]
tokenizer: uax_url_email
default_search:
tokenizer: uax_url_email
filter: [standard, lowercase]
With these settings we are able to search for kim...@elasticsearch.org
and get a result but when we search forhttp://elasticsearch.orgwe
get no results.
We have used curl to verify the results of analyze and see that
http://elasticsearch.organd kim...@elasticsearch.org each geneate a
single token:
curlhttp://localhost:9200/elastic_searchable/_analyze?text=http://elastic...
{"tokens":[{"token":"http://elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}
curl http://localhost:9200/elastic_searchable/_analyze?text=kim...@elasticsearch.org
{"tokens":[{"token":"kim...@elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}
We aren't sure why the url search doesn't work. Any ideas?
Thanks
Probably something to do with how you are searching. I suggest gisting
a small recreation of the issue:Elasticsearch Platform — Find real-time answers at scale | Elastic
On Thu, 2012-03-08 at 11:46 -0800, Ian Carvell wrote:
Here is a gist showing what we are doing: Elasticsearch url search · GitHub
The problem is that you are using the Lucene query parser on the search
string. And it interprets 'http:' as: search in field 'http'
This query works:
curl http://localhost:9200/pages/_search?q=http\\://elasticsearch.org
Alternatively, just use a 'text' query:
curl -XGET 'http://127.0.0.1:9200/pages/_search?pretty=1' -d '
{
"query" : {
"text" : {
"_all" : "http://elasticsearch.org"
}
}
}
'
clint
That was our problem, thanks very much for your help with this.
On Friday, March 9, 2012 2:57:28 AM UTC-8, Clinton Gormley wrote:
On Thu, 2012-03-08 at 11:46 -0800, Ian Carvell wrote:
Here is a gist showing what we are doing:
Elasticsearch url search · GitHub
The problem is that you are using the Lucene query parser on the search
string. And it interprets 'http:' as: search in field 'http'
This query works:
curl http://localhost:9200/pages/_search?q=http\\://elasticsearch.org
Alternatively, just use a 'text' query:
curl -XGET 'http://127.0.0.1:9200/pages/_search?pretty=1' -d '
{
"query" : {
"text" : {
"_all" : "http://elasticsearch.org"
}
}
}
'
clint