Search partial URL

(Davide) #1


I have a problem searching partial URLs in a text field. I'm using a a word_delimiter filter to split possible URLs. Here is the mappings:

                    "tokenizer": "whitespace",
                        "english_possessive_stemmer", "english_plural_stemmer", 
    "mappings": {
    	"test": {
    		"properties": {
    			"body": {
    				"type": "text",
    				"analyzer": "custom_analyzer"

When running a search or aggregating data this leads to unexpected results.
Let's consider the following document:

    "body": " hello com"

It generates 6 tokens: www, google, co, uk hello, com.

If an user searches "", ES returns the document above. Even if it is technically correct, that is what you don't expect.

So I was thinking to implement a filter to parse URLs.
The filter is supposed to generate the following tokens for the token "": (original),, google.
Then at query time, I would a simple analyser that doesn't tokenise data. So if the user searches "google" or "", he will get proper results.

What do you think?

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.