Wildcards in Phrases and Proximity searches


(Christian von Wendt-Jensen) #1

Hi,

I was wondering if it is possible to specify wildcards in a phrase/
proximity search such as:

"elastic* searchengine" which would search for any two words where the
first starts with "elastic" and the other is "searchengine".

For proximity (NEAR) a search could look like this:

"elastic* searchengine"^5 which would return document where two words
exists in proximity of 5 words distance and the first starts with
"elastic".


(Christian von Wendt-Jensen) #2

No suggestions, anyone?


(Clinton Gormley) #3

For proximity (NEAR) a search could look like this:

"elastic* searchengine"^5 which would return document where two words
exists in proximity of 5 words distance and the first starts with
"elastic".

Actually, that syntax boosts the phrase, it doesn't imply a proximity
search. See
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/queryparsersyntax.html

What you want is: (elastic* searchengine)^5

and yes it does support wildcards

clint


(Christian von Wendt-Jensen) #4

Hi,

Well - I made a mistake using the wrong operator, confusing everybody with
my example. I do not want to boost the query. Rather I want to find
document where two or more terms are in proximity of each other, and where
one or more of the terms are wildcard-terms. My example should have been
something like:

"elastic* searchengine"~5

Is this possible?

Regards,

Christian

Den tirsdag den 13. marts 2012 10.34.31 UTC+1 skrev Clinton Gormley:

For proximity (NEAR) a search could look like this:

"elastic* searchengine"^5 which would return document where two words
exists in proximity of 5 words distance and the first starts with
"elastic".

Actually, that syntax boosts the phrase, it doesn't imply a proximity
search. See

http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/queryparsersyntax.html

What you want is: (elastic* searchengine)^5

and yes it does support wildcards

clint


(Clinton Gormley) #5

Well - I made a mistake using the wrong operator, confusing everybody
with my example. I do not want to boost the query. Rather I want to
find document where two or more terms are in proximity of each other,
and where one or more of the terms are wildcard-terms. My example
should have been something like:

"elastic* searchengine"~5

Sorry, yes, that's what I mean too.

Is this possible?

Did you try it?

clint

Regards,

Christian

Den tirsdag den 13. marts 2012 10.34.31 UTC+1 skrev Clinton Gormley:
> For proximity (NEAR) a search could look like this:
>
> "elastic* searchengine"^5 which would return document where
two words
> exists in proximity of 5 words distance and the first starts
with
> "elastic".

    Actually, that syntax boosts the phrase, it doesn't imply a
    proximity
    search. See
    http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/queryparsersyntax.html
    
    What you want is: (elastic* searchengine)^5
    
    and yes it does support wildcards
    
    clint
    
    > 
    > 
    > 

(Christian von Wendt-Jensen) #6

Hi again,

Yes, I've tried both version. In this version:

(elastic searchengine)~5
*
I get a parseException, while this version:

"elastic searchengine"~5
*
simply ignores the wildcard part.

Den onsdag den 14. marts 2012 10.46.55 UTC+1 skrev Clinton Gormley:

Well - I made a mistake using the wrong operator, confusing everybody
with my example. I do not want to boost the query. Rather I want to
find document where two or more terms are in proximity of each other,
and where one or more of the terms are wildcard-terms. My example
should have been something like:

"elastic* searchengine"~5

Sorry, yes, that's what I mean too.

Is this possible?

Did you try it?

clint

Regards,

Christian

Den tirsdag den 13. marts 2012 10.34.31 UTC+1 skrev Clinton Gormley:
> For proximity (NEAR) a search could look like this:
>
> "elastic* searchengine"^5 which would return document where
two words
> exists in proximity of 5 words distance and the first starts
with
> "elastic".

    Actually, that syntax boosts the phrase, it doesn't imply a
    proximity
    search. See

http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/queryparsersyntax.html

    What you want is: (elastic* searchengine)^5
    
    and yes it does support wildcards
    
    clint
    
    > 
    > 
    > 

(Clinton Gormley) #7

On Wed, 2012-03-14 at 05:05 -0700, Christian von Wendt-Jensen wrote:

Hi again,

Yes, I've tried both version. In this version:

(elastic* searchengine)~5

I get a parseException, while this version:

"elastic* searchengine"~5

simply ignores the wildcard part.

Yes, as the docs say:

    Lucene supports single and multiple character wildcard searches
    within single terms (not within phrase queries).

http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/queryparsersyntax.html#Wildcard%20Searches

So it looks like you are out of luck

I've tried to think of a way you could achieve this using ngrams or edge
ngrams, but you couldn't wildcard a single word in the phrase.

clint


(system) #8