Having trouble querying / boosting, learning curve I suspect...? Please help!? :)


(Joshua Rountree) #1

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI FL, I'm
getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the results.

So I thought to myself... I need to boost based on POPULATION somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?


(Joshua Rountree) #2

Also, I'd love it to work similar to this...
http://ws.geonames.org/searchJSON?q=Miami%20FL&maxRows=10&callback=getLocation&noCacheIE=1331244227337

Except mine is just doing cities, only difference.

This returns:
1.) Miami
2.) Miami Beach

That is exactly how mine should return... if anyone could possibly help!
I'm pretty desperate now... :stuck_out_tongue:

On Thursday, March 8, 2012 5:00:12 PM UTC-5, Joshua Rountree wrote:

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI FL,
I'm getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the
results.

So I thought to myself... I need to boost based on POPULATION somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?

On Thursday, March 8, 2012 5:00:12 PM UTC-5, Joshua Rountree wrote:

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI FL,
I'm getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the
results.

So I thought to myself... I need to boost based on POPULATION somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?


(Radu Gheorghe) #3

Hi Joshua,

The text query should do the trick:
curl -XGET http://localhost:9200/_search?pretty=true -d '{
"query" : {
"text" : {
"message" : "Miami FL"
}
}
}'

First results would be those that contain both Miami and FL.

You can also try a bool query:

http://www.elasticsearch.org/guide/reference/query-dsl/bool-query.html

Then you can put some queries as "must" (need to match to return a
hit) and some as "should" (hit is still returned even if the document
doesn't match, but I assume it has a higher score).

Hope this helps. I'm also in a steep ES learning curve :slight_smile:

Best regards,
Radu

On Mar 9, 12:00 am, Joshua Rountree jos...@remote-app.com wrote:

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI FL, I'm
getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the results.

So I thought to myself... I need to boost based on POPULATION somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?

miami_search.png
40KViewDownload


(Joshua Rountree) #4

Worked like a charm... Thank you Thank you!
Miami Beach still isn't SECOND but I think I just need to weight population
some how...
Any ideas on how to weight based on a dynamic integer?
Somehow make the boost in rank by population?
Some equation ?

Or should I create a new field in my database that stores a predefined
"boost" value?

Sincerely,
Joshua F. Rountree

On Friday, March 9, 2012 2:16:19 AM UTC-5, Radu Gheorghe wrote:

Hi Joshua,

The text query should do the trick:
curl -XGET http://localhost:9200/_search?pretty=true -d '{
"query" : {
"text" : {
"message" : "Miami FL"
}
}
}'

First results would be those that contain both Miami and FL.

You can also try a bool query:

http://www.elasticsearch.org/guide/reference/query-dsl/bool-query.html

Then you can put some queries as "must" (need to match to return a
hit) and some as "should" (hit is still returned even if the document
doesn't match, but I assume it has a higher score).

Hope this helps. I'm also in a steep ES learning curve :slight_smile:

Best regards,
Radu

On Mar 9, 12:00 am, Joshua Rountree jos...@remote-app.com wrote:

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI FL,
I'm
getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the
results.

So I thought to myself... I need to boost based on POPULATION somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?

miami_search.png
40KViewDownload


(egaumer) #5

You could try sorting by population (just index a population field) but
that might be too rigid (very easy and quick to test though).

Have a look at the custom score query. You could factor in the population
with regards to scoring.

http://www.elasticsearch.org/guide/reference/query-dsl/custom-score-query.html

-Eric

On Friday, March 9, 2012 8:58:25 AM UTC-5, Joshua Rountree wrote:

Worked like a charm... Thank you Thank you!
Miami Beach still isn't SECOND but I think I just need to weight
population some how...
Any ideas on how to weight based on a dynamic integer?
Somehow make the boost in rank by population?
Some equation ?

Or should I create a new field in my database that stores a predefined
"boost" value?

Sincerely,
Joshua F. Rountree

On Friday, March 9, 2012 2:16:19 AM UTC-5, Radu Gheorghe wrote:

Hi Joshua,

The text query should do the trick:
curl -XGET http://localhost:9200/_search?pretty=true -d '{
"query" : {
"text" : {
"message" : "Miami FL"
}
}
}'

First results would be those that contain both Miami and FL.

You can also try a bool query:

http://www.elasticsearch.org/guide/reference/query-dsl/bool-query.html

Then you can put some queries as "must" (need to match to return a
hit) and some as "should" (hit is still returned even if the document
doesn't match, but I assume it has a higher score).

Hope this helps. I'm also in a steep ES learning curve :slight_smile:

Best regards,
Radu

On Mar 9, 12:00 am, Joshua Rountree jos...@remote-app.com wrote:

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI FL,
I'm
getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the
results.

So I thought to myself... I need to boost based on POPULATION
somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?

miami_search.png
40KViewDownload


(Joshua Rountree) #6

Okay, and how would I pick up "partial" matches?

Like as they type "Miam"?
Keep in mind, I'd like all of this wrapped into one query, is that possible?

On Thursday, March 8, 2012 5:00:12 PM UTC-5, Joshua Rountree wrote:

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI FL,
I'm getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the
results.

So I thought to myself... I need to boost based on POPULATION somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?


(egaumer) #7

What do you mean by partial matches? Like an auto-suggest type ahead
feature?

-Eric

On Friday, March 9, 2012 4:38:26 PM UTC-5, Joshua Rountree wrote:

Okay, and how would I pick up "partial" matches?

Like as they type "Miam"?
Keep in mind, I'd like all of this wrapped into one query, is that
possible?

On Thursday, March 8, 2012 5:00:12 PM UTC-5, Joshua Rountree wrote:

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI FL,
I'm getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the
results.

So I thought to myself... I need to boost based on POPULATION somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?


(Joshua Rountree) #8

Yeah, I was just about to reply again and say I found how to do it in the
docs...
Is "phrase_prefix" an acceptable way to handle it?

On Friday, March 9, 2012 4:57:37 PM UTC-5, egaumer wrote:

What do you mean by partial matches? Like an auto-suggest type ahead
feature?

-Eric

On Friday, March 9, 2012 4:38:26 PM UTC-5, Joshua Rountree wrote:

Okay, and how would I pick up "partial" matches?

Like as they type "Miam"?
Keep in mind, I'd like all of this wrapped into one query, is that
possible?

On Thursday, March 8, 2012 5:00:12 PM UTC-5, Joshua Rountree wrote:

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI FL,
I'm getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the
results.

So I thought to myself... I need to boost based on POPULATION somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?


(Joshua Rountree) #9

Blah, nevermind...
Now things like "Miami Florida" bomb out...
;(

On Friday, March 9, 2012 4:57:37 PM UTC-5, egaumer wrote:

What do you mean by partial matches? Like an auto-suggest type ahead
feature?

-Eric

On Friday, March 9, 2012 4:38:26 PM UTC-5, Joshua Rountree wrote:

Okay, and how would I pick up "partial" matches?

Like as they type "Miam"?
Keep in mind, I'd like all of this wrapped into one query, is that
possible?

On Thursday, March 8, 2012 5:00:12 PM UTC-5, Joshua Rountree wrote:

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI FL,
I'm getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the
results.

So I thought to myself... I need to boost based on POPULATION somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?


(egaumer) #10

Executing this type of search requires a more efficient indexing strategy.
Generally speaking, you need to use an n-gram analyzer to break each token
down. Once you have this, you can do fast prefix searches across the
content. For example, here is a demo we did for a customer.

Clinton wrote a small tutorial on Stack Overflow that outlines some of the
techniques and procedures. Have a look.

If your document count is small enough, you might get by with a simpler
solution.

-Eric

On Friday, March 9, 2012 5:09:53 PM UTC-5, Joshua Rountree wrote:

Yeah, I was just about to reply again and say I found how to do it in the
docs...
Is "phrase_prefix" an acceptable way to handle it?

On Friday, March 9, 2012 4:57:37 PM UTC-5, egaumer wrote:

What do you mean by partial matches? Like an auto-suggest type ahead
feature?

-Eric

On Friday, March 9, 2012 4:38:26 PM UTC-5, Joshua Rountree wrote:

Okay, and how would I pick up "partial" matches?

Like as they type "Miam"?
Keep in mind, I'd like all of this wrapped into one query, is that
possible?

On Thursday, March 8, 2012 5:00:12 PM UTC-5, Joshua Rountree wrote:

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI FL,
I'm getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the
results.

So I thought to myself... I need to boost based on POPULATION somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?


(Joshua Rountree) #11

How about I use two separate fields and then combine the scores?
SOmehow?

My main goal is to get this:
Miami FL could be found in one field or Miami Florida could be found in the
other...?
That's as advanced as I need...

If both are the same record then combine the scores maybe?

Maybe I'm being a total n00b?

On Friday, March 9, 2012 5:48:40 PM UTC-5, egaumer wrote:

Executing this type of search requires a more efficient indexing strategy.
Generally speaking, you need to use an n-gram analyzer to break each token
down. Once you have this, you can do fast prefix searches across the
content. For example, here is a demo we did for a customer.

http://youtu.be/yFTdh0ahx90

Clinton wrote a small tutorial on Stack Overflow that outlines some of the
techniques and procedures. Have a look.

http://stackoverflow.com/questions/9421358/filename-search-with-elasticsearch

If your document count is small enough, you might get by with a simpler
solution.

-Eric

On Friday, March 9, 2012 5:09:53 PM UTC-5, Joshua Rountree wrote:

Yeah, I was just about to reply again and say I found how to do it in the
docs...
Is "phrase_prefix" an acceptable way to handle it?

On Friday, March 9, 2012 4:57:37 PM UTC-5, egaumer wrote:

What do you mean by partial matches? Like an auto-suggest type ahead
feature?

-Eric

On Friday, March 9, 2012 4:38:26 PM UTC-5, Joshua Rountree wrote:

Okay, and how would I pick up "partial" matches?

Like as they type "Miam"?
Keep in mind, I'd like all of this wrapped into one query, is that
possible?

On Thursday, March 8, 2012 5:00:12 PM UTC-5, Joshua Rountree wrote:

I'm new to elastic search...
First off, I love this so far (I think)....

The documentation confuses me.

So I've indexed some geonames data from a mysql table.
It has a field that contains things like:
"Miami FL Florida USA US United States"

Whatever...

So, for the most part with MOST CITIES it returns perfectly accurate
results.

But if I search "Miami FL" or "Miami Florida" I'm not getting MIAMI
FL, I'm getting Miami Gardens, FL, etc. Miami FL shows up like #10 in the
results.

So I thought to myself... I need to boost based on POPULATION
somehow...

Is this possible?

I tried looking into "boosting" "boost_field" etc and simply do not
understand how to approach this.

Combining with the search term...
Ideally, the higher the population - the higher the relevance...?

Any clues?


(system) #12