Synonym configuration


(jessejlt) #1

Greetings!

I'm new to ElasticSearch and could use some quick validation of my
configuration.

I installed ES via brew on Lion, so ES is installed at:
/usr/local/Cellar/elasticsearch/0.17.1

My config file lives at:
/usr/local/Cellar/elasticsearch/0.17.1/config/elasticsearch.yml

And I'm launching ES manually via:
elasticsearch -f -D es.config=/usr/local/Cellar/elasticsearch/0.17.1/
config/elasticsearch.yml

Now in my config file (elasticsearch.yml), I have the following index
definition:

index :
analysis :
analyzer :
standard :
type : standard

    tokenizer :
        myTokenizer1 :
            type : standard

    filter :
        myTokenFilter1 :
            type : synonym
            synonyms_path : synonym.txt

And peer to elasticsearch.yml I have created a file named synonym.txt.
A sample of the data in synonym.txt is:

field type, scan type, field dominance

upper field first, uff

After reading the documentation at:
http://www.elasticsearch.org/guide/reference/index-modules/analysis/synonym-tokenfilter.html
It seems to suggest that that:
field type == scan type == field dominance
and
upper field first == off

as in, they're all considered the same thing.

So to test this, I added a new document:

{ "scan type" : "upper field first" }

and then searched for it using:

curl -XGET http://localhost:9200/test/records/_search?q=uff

But I received no hits, but I would have expected that uff would have
translated to "upper field first", which should have received a hit.

So I must be doing something wrong, and this is where I need your
help.

Much thanks!
-jesse


(jessejlt) #2

After doing some searches in this group I see that most folks use
gists for this sort of thing, so I've created a gist:

On Aug 19, 12:03 pm, jessejlt jesse...@gmail.com wrote:

Greetings!

I'm new to ElasticSearch and could use some quick validation of my
configuration.

I installed ES via brew on Lion, so ES is installed at:
/usr/local/Cellar/elasticsearch/0.17.1

My config file lives at:
/usr/local/Cellar/elasticsearch/0.17.1/config/elasticsearch.yml

And I'm launching ES manually via:
elasticsearch -f -D es.config=/usr/local/Cellar/elasticsearch/0.17.1/
config/elasticsearch.yml

Now in my config file (elasticsearch.yml), I have the following index
definition:

index :
analysis :
analyzer :
standard :
type : standard

    tokenizer :
        myTokenizer1 :
            type : standard

    filter :
        myTokenFilter1 :
            type : synonym
            synonyms_path : synonym.txt

And peer to elasticsearch.yml I have created a file named synonym.txt.
A sample of the data in synonym.txt is:

field type, scan type, field dominance

upper field first, uff

After reading the documentation at:http://www.elasticsearch.org/guide/reference/index-modules/analysis/s...
It seems to suggest that that:
field type == scan type == field dominance
and
upper field first == off

as in, they're all considered the same thing.

So to test this, I added a new document:

{ "scan type" : "upper field first" }

and then searched for it using:

curl -XGEThttp://localhost:9200/test/records/_search?q=uff

But I received no hits, but I would have expected that uff would have
translated to "upper field first", which should have received a hit.

So I must be doing something wrong, and this is where I need your
help.

Much thanks!
-jesse


(Clinton Gormley) #3

Hi Jesse

You had a few issues:

  1. You weren't specifying the analyzer correctly - it should
    be a custom analyzer, which includes in its definition your
    synonyms token filter

  2. You weren't applying that analyzer to any field

  3. You were searching on the _all field which has its own
    analyzer (standard, unless you change that) so even
    if your field is analyzed with synonyms, the _all field
    wouldn't be.

I've gisted a demo so that you can see how to do this correctly:

clint


(jessejlt) #4

Hey thanks, Clinton!

Does this mean that the _all field cannot have the synonym analyzer
attached to it?

On Aug 20, 3:17 am, Clinton Gormley cl...@traveljury.com wrote:

Hi Jesse

You had a few issues:

  1. You weren't specifying the analyzer correctly - it should
    be a custom analyzer, which includes in its definition your
    synonyms token filter

  2. You weren't applying that analyzer to any field

  3. You were searching on the _all field which has its own
    analyzer (standard, unless you change that) so even
    if your field is analyzed with synonyms, the _all field
    wouldn't be.

I've gisted a demo so that you can see how to do this correctly:

https://gist.github.com/4193a7d63af534d3b0ef

clint


(Clinton Gormley) #5

Hi Jesse

Does this mean that the _all field cannot have the synonym analyzer
attached to it?

The _all field uses the standard analyzer by default, but there is no
reason that you can't change it to use whatever analyzer you like.

However, it will be applied to all terms included in the _all field,
which probably isn't what you want.

clint


(jessejlt) #6

Thanks again!

So synonym filters are capable of identifying synonyms for a field's
value, do you know if it's possible to declare synonyms at the field
level? I can do this server-side before touching ES, but before doing
so I figure it's worth seeing if this functionality is already in ES.

So an example of a record:

{
"field-type" : "Upper Field First"
}

Where "field-type" is a synonym for "field-order" and "scan-type".

On Aug 22, 12:27 pm, Clinton Gormley cl...@traveljury.com wrote:

Hi Jesse

Does this mean that the _all field cannot have the synonym analyzer
attached to it?

The _all field uses the standard analyzer by default, but there is no
reason that you can't change it to use whatever analyzer you like.

However, it will be applied to all terms included in the _all field,
which probably isn't what you want.

clint


(Clinton Gormley) #7

Hiya

So synonym filters are capable of identifying synonyms for a field's
value, do you know if it's possible to declare synonyms at the field
level? I can do this server-side before touching ES, but before doing
so I figure it's worth seeing if this functionality is already in ES.

So an example of a record:

{
"field-type" : "Upper Field First"
}

Where "field-type" is a synonym for "field-order" and "scan-type".

I don't quite follow what it is you are trying to achieve here.

What ES does have is "multi-fields"
http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

ie you provide the field 'field-type' and in ES, that field actually
consists of several "sub-fields", eg 'field-type.analyzed',
'field-type.not_analyzed', etc

If your search on 'field-type', you would actually be searching on
'field-type.field-type'.

Also, you can set an index_name on any field (including those
sub-fields) which would allow you to refer to eg 'field-type.foo' as
field 'foo'

http://www.elasticsearch.org/guide/reference/mapping/core-types.html

clint


(system) #8