Search For Url returning no results

Hello,

We are trying to search for urls within text in a field. For example
we have a field called content that has the value "use http://elasticsearch.org
for search and email kimchy@elasticsearch.org for questions."

We are using the following configuration for our elasticsearch.yml:

index:
analysis:
analyzer:
default_index:
filter: [standard, lowercase ]
tokenizer: uax_url_email
default_search:
tokenizer: uax_url_email
filter: [standard, lowercase]

With these settings we are able to search for kimchy@elasticsearch.org
and get a result but when we search for http://elasticsearch.org we
get no results.

We have used curl to verify the results of analyze and see that
http://elasticsearch.org and kimchy@elasticsearch.org each geneate a
single token:

curl http://localhost:9200/elastic_searchable/_analyze?text=http://elasticsearch.org
{"tokens":[{"token":"http://elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}

curl http://localhost:9200/elastic_searchable/_analyze?text=kimchy@elasticsearch.org
{"tokens":[{"token":"kimchy@elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}

We aren't sure why the url search doesn't work. Any ideas?

Thanks

On Thu 08 Mar 2012 03:12:22 CET, Ian Carvell wrote:

Hello,

We are trying to search for urls within text in a field. For example
we have a field called content that has the value "use http://elasticsearch.org
for search and email kimchy@elasticsearch.org for questions."

We are using the following configuration for our elasticsearch.yml:

index:
analysis:
analyzer:
default_index:
filter: [standard, lowercase ]
tokenizer: uax_url_email
default_search:
tokenizer: uax_url_email
filter: [standard, lowercase]

With these settings we are able to search for kimchy@elasticsearch.org
and get a result but when we search for http://elasticsearch.org we
get no results.

We have used curl to verify the results of analyze and see that
http://elasticsearch.org and kimchy@elasticsearch.org each geneate a
single token:

curl http://localhost:9200/elastic_searchable/_analyze?text=http://elasticsearch.org
{"tokens":[{"token":"http://elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}

curl http://localhost:9200/elastic_searchable/_analyze?text=kimchy@elasticsearch.org
{"tokens":[{"token":"kimchy@elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}

We aren't sure why the url search doesn't work. Any ideas?

Thanks

Probably something to do with how you are searching. I suggest gisting
a small recreation of the issue: Elasticsearch Platform — Find real-time answers at scale | Elastic

Here is a gist showing what we are doing: Elasticsearch url search · GitHub

On Mar 7, 11:51 pm, Clinton Gormley cl...@traveljury.com wrote:

On Thu 08 Mar 2012 03:12:22 CET, Ian Carvell wrote:

Hello,

We are trying to search for urls within text in a field. For example
we have a field called content that has the value "usehttp://elasticsearch.org
for search and email kim...@elasticsearch.org for questions."

We are using the following configuration for our elasticsearch.yml:

index:
analysis:
analyzer:
default_index:
filter: [standard, lowercase ]
tokenizer: uax_url_email
default_search:
tokenizer: uax_url_email
filter: [standard, lowercase]

With these settings we are able to search for kim...@elasticsearch.org
and get a result but when we search forhttp://elasticsearch.orgwe
get no results.

We have used curl to verify the results of analyze and see that
http://elasticsearch.organd kim...@elasticsearch.org each geneate a
single token:

curlhttp://localhost:9200/elastic_searchable/_analyze?text=http://elastic...
{"tokens":[{"token":"http://elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}

curl http://localhost:9200/elastic_searchable/_analyze?text=kim...@elasticsearch.org
{"tokens":[{"token":"kim...@elasticsearch.org","start_offset":
0,"end_offset":24,"type":"","position":1}]}

We aren't sure why the url search doesn't work. Any ideas?

Thanks

Probably something to do with how you are searching. I suggest gisting
a small recreation of the issue:Elasticsearch Platform — Find real-time answers at scale | Elastic

On Thu, 2012-03-08 at 11:46 -0800, Ian Carvell wrote:

Here is a gist showing what we are doing: Elasticsearch url search · GitHub

The problem is that you are using the Lucene query parser on the search
string. And it interprets 'http:' as: search in field 'http'

This query works:

curl http://localhost:9200/pages/_search?q=http\\://elasticsearch.org

Alternatively, just use a 'text' query:

curl -XGET 'http://127.0.0.1:9200/pages/_search?pretty=1' -d '
{
"query" : {
"text" : {
"_all" : "http://elasticsearch.org"
}
}
}
'

clint

That was our problem, thanks very much for your help with this.

On Friday, March 9, 2012 2:57:28 AM UTC-8, Clinton Gormley wrote:

On Thu, 2012-03-08 at 11:46 -0800, Ian Carvell wrote:

Here is a gist showing what we are doing:
Elasticsearch url search · GitHub

The problem is that you are using the Lucene query parser on the search
string. And it interprets 'http:' as: search in field 'http'

This query works:

curl http://localhost:9200/pages/_search?q=http\\://elasticsearch.org

Alternatively, just use a 'text' query:

curl -XGET 'http://127.0.0.1:9200/pages/_search?pretty=1' -d '
{
"query" : {
"text" : {
"_all" : "http://elasticsearch.org"
}
}
}
'

clint