Search count discrepancy

I just started experimenting with ES and I have the following issue:

From a MySQL DB table, I created an index of records using the
Ruby Tire gem.

The index (checked with curl and ruby console) shows the same
count of indexed items as there are DB rows. Good so far.

When I do a simple single-word query (with Ruby or curl) e.g.

'{ "query" : { "term" : { "title" : "buddha" }}}'

I get fewer hits compared to

select count(*) from resources where title like '%buddha%';

The latter returns 423; ES returns 293.

What would account for this? All suggestions appreciated :slight_smile:

There are no errors being reported to stdout (running with -f flag)
BTW. Platform: Mac OS 10.7.5, ES 0.19.12.

--
Hassan Schroeder ------------------------ hassan.schroeder@gmail.com
http://about.me/hassanschroeder
twitter: @hassan

--

The search you're performing should be equivalent to:

select count(*) from resources where title like 'buddha';

If you want '%buddha%' you would use a wildcard search with 'buddha'
(it's not ideal to use wildcards as prefixes though - there are ways to
optimize for that).

Also, the type of analyzer you are using is important. I believe by default
ES will use the standard analyzer which lowercases everything, which is
probably what you want. If however you set a different analyzer it's
possible that a search for 'buddha' does not find 'Buddha'.

I've found ElasticSearch Head to be a big help it doing quick queries and
looking at the documents in the index (Luke is good too, but I only pull
that out when things get dirty).
http://mobz.github.com/elasticsearch-head/

Hope that helps,
Anil

On Thursday, December 6, 2012 4:13:36 PM UTC-8, hassan wrote:

I just started experimenting with ES and I have the following issue:

From a MySQL DB table, I created an index of records using the
Ruby Tire gem.

The index (checked with curl and ruby console) shows the same
count of indexed items as there are DB rows. Good so far.

When I do a simple single-word query (with Ruby or curl) e.g.

'{ "query" : { "term" : { "title" : "buddha" }}}'

I get fewer hits compared to

select count(*) from resources where title like '%buddha%';

The latter returns 423; ES returns 293.

What would account for this? All suggestions appreciated :slight_smile:

There are no errors being reported to stdout (running with -f flag)
BTW. Platform: Mac OS 10.7.5, ES 0.19.12.

--
Hassan Schroeder ------------------------ hassan.s...@gmail.com<javascript:>
http://about.me/hassanschroeder
twitter: @hassan

--

Hmm... ignore my SQL remark. The point I was trying to make, but failed,
was that I don't think the default analyzer does any stemming. So if your
document contains "The buddhas are happy." your SQL query will find the
document whereas your ES query will not.

On Thursday, December 6, 2012 4:49:42 PM UTC-8, Anil Rhemtulla wrote:

The search you're performing should be equivalent to:

select count(*) from resources where title like 'buddha';

If you want '%buddha%' you would use a wildcard search with 'buddha'
(it's not ideal to use wildcards as prefixes though - there are ways to
optimize for that).

Also, the type of analyzer you are using is important. I believe by
default ES will use the standard analyzer which lowercases everything,
which is probably what you want. If however you set a different analyzer
it's possible that a search for 'buddha' does not find 'Buddha'.

I've found ElasticSearch Head to be a big help it doing quick queries and
looking at the documents in the index (Luke is good too, but I only pull
that out when things get dirty).
http://mobz.github.com/elasticsearch-head/

Hope that helps,
Anil

On Thursday, December 6, 2012 4:13:36 PM UTC-8, hassan wrote:

I just started experimenting with ES and I have the following issue:

From a MySQL DB table, I created an index of records using the
Ruby Tire gem.

The index (checked with curl and ruby console) shows the same
count of indexed items as there are DB rows. Good so far.

When I do a simple single-word query (with Ruby or curl) e.g.

'{ "query" : { "term" : { "title" : "buddha" }}}'

I get fewer hits compared to

select count(*) from resources where title like '%buddha%';

The latter returns 423; ES returns 293.

What would account for this? All suggestions appreciated :slight_smile:

There are no errors being reported to stdout (running with -f flag)
BTW. Platform: Mac OS 10.7.5, ES 0.19.12.

--
Hassan Schroeder ------------------------ hassan.s...@gmail.com
http://about.me/hassanschroeder
twitter: @hassan

--

On Thu, Dec 6, 2012 at 4:49 PM, Anil Rhemtulla anil.rhemtulla@gmail.com wrote:

The search you're performing should be equivalent to:

select count(*) from resources where title like 'buddha';

? In MySQL, with this dataset, that results in 2 (case-insensitive) "exact"
matches, same as title = 'buddha' .

If you want '%buddha%' you would use a wildcard search with 'buddha' (it's
not ideal to use wildcards as prefixes though - there are ways to optimize
for that).

Are you referring to the ES or MySQL search query?

Also, the type of analyzer you are using is important. I believe by default
ES will use the standard analyzer which lowercases everything, which is
probably what you want. If however you set a different analyzer it's
possible that a search for 'buddha' does not find 'Buddha'.

Didn't even know there was such a thing as an "analyzer" :slight_smile: So for
sure I didn't configure a non-default one.

I've found ElasticSearch Head to be a big help

Thanks for the tip, checking it out now.

--
Hassan Schroeder ------------------------ hassan.schroeder@gmail.com
http://about.me/hassanschroeder
twitter: @hassan

--

On Thu, Dec 6, 2012 at 4:54 PM, Anil Rhemtulla anil.rhemtulla@gmail.com wrote:

Hmm... ignore my SQL remark. The point I was trying to make, but failed, was
that I don't think the default analyzer does any stemming. So if your
document contains "The buddhas are happy." your SQL query will find the
document whereas your ES query will not.

Awesome -- I didn't think about stemming in this context and that's
probably it. Thanks. Off to learn about analyzers :slight_smile:

--
Hassan Schroeder ------------------------ hassan.schroeder@gmail.com
http://about.me/hassanschroeder
twitter: @hassan

--