Specifying analyzer for search facet


(Jonathan Ness) #1

I'd like to use the whitespace analyzer to apply for facet
operations. It seems to be using the default analyzer, not the
analyzer I specify.

I create a new index with a number of fields:

create :mappings => {
:xplico => {
:properties => {
...
:contenttype => { :type => 'string', :analyzer =>
'whitespace' },
...

I then store a number of records, populating this :contenttype
property with values such as:

application/x-gzip; charset=binary
image/png; charset=binary
image/gif; charset=binary
...

Now I want a listing of the number of records having contenttype image/
png, image/gif, application/x-gzip, etc.

s = Tire.search 'xplico' do
query { all }
facet 'contenttype', :global => true do
terms :contenttype
end
end

s.results.facets['contenttype']['terms'].each do |f|
puts "#{f['term'].ljust(20)} #{f['count']}"
end

I would expect

application/x-gzip 45
image/png; 29
image/gif; 12

Instead I get

charset 162
binary 159
image 114
jpeg 70
x 45
gzip 45
application 45
png 29
gif 12
text 3

Why didn't it use teh whitespace analyzer?

Thanks.

Jonathan


(Shay Banon) #2

Your mapping probably did not apply. You can easily check what the mapping
is for a specific type using the get mapping API, and if its not what you
set, then either your Tire config is wrong, or maybe you tried to change
the mapping on an index that already exists and already has mappings for
that field?

On Tue, May 8, 2012 at 4:28 AM, Jonathan Ness jness123456789@gmail.comwrote:

I'd like to use the whitespace analyzer to apply for facet
operations. It seems to be using the default analyzer, not the
analyzer I specify.

I create a new index with a number of fields:

create :mappings => {
:xplico => {
:properties => {
...
:contenttype => { :type => 'string', :analyzer =>
'whitespace' },
...

I then store a number of records, populating this :contenttype
property with values such as:

application/x-gzip; charset=binary
image/png; charset=binary
image/gif; charset=binary
...

Now I want a listing of the number of records having contenttype image/
png, image/gif, application/x-gzip, etc.

s = Tire.search 'xplico' do
query { all }
facet 'contenttype', :global => true do
terms :contenttype
end
end

s.results.facets['contenttype']['terms'].each do |f|
puts "#{f['term'].ljust(20)} #{f['count']}"
end

I would expect

application/x-gzip 45
image/png; 29
image/gif; 12

Instead I get

charset 162
binary 159
image 114
jpeg 70
x 45
gzip 45
application 45
png 29
gif 12
text 3

Why didn't it use teh whitespace analyzer?

Thanks.

Jonathan


(Jonathan Ness) #3

Hi Shay,

To test your suggestion, I created a simpler example and I really do
think that the analyzer is not working in conjunction with facets
correctly. Here is the complete example to demonstrate it:

require 'rubygems'
require 'tire'

Tire.index 'demo' do
delete

create :mappings => {
:demo => {
:properties => {
:demostring => { :type => 'string', :analyzer =>
'whitespace' }
}
}
}

store :demostring => 'One/Two'
store :demostring => 'One Two'

refresh
end

s = Tire.search 'demo' do
query { all }
facet 'demostring', :global => true do
terms :demostring
end
end

puts "Found #{s.results.count} entries:
#{s.results.map(&:demostring).join(', ')}"
puts "Counts by demostring:", "-"*25
s.results.facets['demostring']['terms'].each do |f|
puts "#{f['term'].ljust(20)} #{f['count']}"
end

With a whitespace analyzer, I would expect one instance of the string
'One', one instance of the string 'Two', and one instance of the
string 'One/Two'. Instead, I got the following:

Counts by demostring:

two 2
one 2

Any suggestions for what I might be doing wrong here?

Thanks!!

Jonathan

On May 9, 2:22 am, Shay Banon kim...@gmail.com wrote:

Your mapping probably did not apply. You can easily check what the mapping
is for a specific type using the get mapping API, and if its not what you
set, then either your Tire config is wrong, or maybe you tried to change
the mapping on an index that already exists and already has mappings for
that field?

On Tue, May 8, 2012 at 4:28 AM, Jonathan Ness jness123456...@gmail.comwrote:

I'd like to use the whitespace analyzer to apply for facet
operations. It seems to be using the default analyzer, not the
analyzer I specify.

I create a new index with a number of fields:

create :mappings => {
:xplico => {
:properties => {
...
:contenttype => { :type => 'string', :analyzer =>
'whitespace' },
...

I then store a number of records, populating this :contenttype
property with values such as:

application/x-gzip; charset=binary
image/png; charset=binary
image/gif; charset=binary
...

Now I want a listing of the number of records having contenttype image/
png, image/gif, application/x-gzip, etc.

s = Tire.search 'xplico' do
query { all }
facet 'contenttype', :global => true do
terms :contenttype
end
end

s.results.facets['contenttype']['terms'].each do |f|
puts "#{f['term'].ljust(20)} #{f['count']}"
end

I would expect

application/x-gzip 45
image/png; 29
image/gif; 12

Instead I get

charset 162
binary 159
image 114
jpeg 70
x 45
gzip 45
application 45
png 29
gif 12
text 3

Why didn't it use teh whitespace analyzer?

Thanks.

Jonathan


(Shay Banon) #4

curl recreation are best (and I think you can turn on a flag to log curl
operations with Tire), here is one that works:
https://gist.github.com/2688072.

On Fri, May 11, 2012 at 4:48 AM, Jonathan Ness jness123456789@gmail.comwrote:

Hi Shay,

To test your suggestion, I created a simpler example and I really do
think that the analyzer is not working in conjunction with facets
correctly. Here is the complete example to demonstrate it:

require 'rubygems'
require 'tire'

Tire.index 'demo' do
delete

create :mappings => {
:demo => {
:properties => {
:demostring => { :type => 'string', :analyzer =>
'whitespace' }
}
}
}

store :demostring => 'One/Two'
store :demostring => 'One Two'

refresh
end

s = Tire.search 'demo' do
query { all }
facet 'demostring', :global => true do
terms :demostring
end
end

puts "Found #{s.results.count} entries:
#{s.results.map(&:demostring).join(', ')}"
puts "Counts by demostring:", "-"*25
s.results.facets['demostring']['terms'].each do |f|
puts "#{f['term'].ljust(20)} #{f['count']}"
end

With a whitespace analyzer, I would expect one instance of the string
'One', one instance of the string 'Two', and one instance of the
string 'One/Two'. Instead, I got the following:

Counts by demostring:

two 2
one 2

Any suggestions for what I might be doing wrong here?

Thanks!!

Jonathan

On May 9, 2:22 am, Shay Banon kim...@gmail.com wrote:

Your mapping probably did not apply. You can easily check what the
mapping
is for a specific type using the get mapping API, and if its not what
you
set, then either your Tire config is wrong, or maybe you tried to change
the mapping on an index that already exists and already has mappings for
that field?

On Tue, May 8, 2012 at 4:28 AM, Jonathan Ness <jness123456...@gmail.com
wrote:

I'd like to use the whitespace analyzer to apply for facet
operations. It seems to be using the default analyzer, not the
analyzer I specify.

I create a new index with a number of fields:

create :mappings => {
:xplico => {
:properties => {
...
:contenttype => { :type => 'string', :analyzer =>
'whitespace' },
...

I then store a number of records, populating this :contenttype
property with values such as:

application/x-gzip; charset=binary
image/png; charset=binary
image/gif; charset=binary
...

Now I want a listing of the number of records having contenttype image/
png, image/gif, application/x-gzip, etc.

s = Tire.search 'xplico' do
query { all }
facet 'contenttype', :global => true do
terms :contenttype
end
end

s.results.facets['contenttype']['terms'].each do |f|
puts "#{f['term'].ljust(20)} #{f['count']}"
end

I would expect

application/x-gzip 45
image/png; 29
image/gif; 12

Instead I get

charset 162
binary 159
image 114
jpeg 70
x 45
gzip 45
application 45
png 29
gif 12
text 3

Why didn't it use teh whitespace analyzer?

Thanks.

Jonathan


(system) #5