Synonym token filter


(Alexander P.) #1

Hi,

I'm trying to install a synonym token filter for an existing index and
having a hard time understanding how this should be done. I've created a
synonym.txt file, but I can't understand how to implement the config
described in the doc:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html.
Is this a file? If so, should it go into the config directory? Or is this
supposed to be PUT via curl? None of the things I've tried so far worked.
Please help!

Thanks a lot,
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d89026ea-aad1-4537-8dac-8ea18a0c6b13%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Rafał Kuć) #2

Hello!

This is a part of the mappings you send to Elasticsearch, for example during index creation. The synonyms_path property is relative to the config directory. So if your file is synonym.txt, it should go to $ES_HOME/config and you could send the following command to create an index:

curl -XPOST 'localhost:9200/test' -d '

{

"settings": {

"index" : {

"analysis" : {

"analyzer" : {


 "synonym" : {


  "tokenizer" : "whitespace",


  "filter" : ["synonym"]


 }


},


"filter" : {


 "synonym" : {


  "type" : "synonym",


  "synonyms_path" : "synonym.txt"


 }


}

}

}

},

"mappings" : {

"test" : {

"properties" : {

"name" : { "type" : "string", "index" : "analyzed", "analyzer" : "synonym" }

}

}

}

}'

My synonym.txt file had the following contents:

aaa=>bbb

Now to test it, just run the following command:

curl -XGET 'localhost:9200/test/_analyze?analyzer=synonym&text=aaa+test&pretty=true'

And you should get something like this:

{

"tokens" : [ {

"token" : "bbb",


"start_offset" : 0,


"end_offset" : 3,


"type" : "SYNONYM",


"position" : 1

}, {

"token" : "test",


"start_offset" : 4,


"end_offset" : 8,


"type" : "word",


"position" : 2

} ]

}

So, as you can see it works. Can you check if it works for you?

--

Regards,

Rafał Kuć

Performance Monitoring * Log Analytics * Search Analytics

Solr & Elasticsearch Support * http://sematext.com/

Hi,

I'm trying to install a synonym token filter for an existing index and having a hard time understanding how this should be done. I've created a synonym.txt file, but I can't understand how to implement the config described in the doc: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html. Is this a file? If so, should it go into the config directory? Or is this supposed to be PUT via curl? None of the things I've tried so far worked. Please help!

Thanks a lot,

Alex

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d89026ea-aad1-4537-8dac-8ea18a0c6b13%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


(Alexander P.) #3

Rafal, thanks for a quick reply! I think I already understood how to do
this for a new index. The issue is how do you do this for an existing
index? Am I supposed to do smth like this?:

curl -XPOST 'http://localhost:9200/my_twitter_river/settings/' -d '
{
"analysis" : {
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "synonym.txt"
}
}
}
}
'

Also, some posts seem to indicate that if I run a query on _all fileds,
this won't be taken into account anyway. Is this true?

Thanks!

On Monday, December 30, 2013 1:51:59 PM UTC+1, Rafał Kuć wrote:

Hello!

This is a part of the mappings you send to Elasticsearch, for example
during index creation. The synonyms_path property is relative to the config
directory. So if your file is synonym.txt, it should go to $ES_HOME/config
and you could send the following command to create an index:

curl -XPOST 'localhost:9200/test' -d '
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "synonym.txt"
}
}
}
}
},
"mappings" : {
"test" : {
"properties" : {
"name" : { "type" : "string", "index" : "analyzed", "analyzer" :
"synonym" }
}
}
}
}'

My synonym.txt file had the following contents:
aaa=>bbb

Now to test it, just run the following command:
curl -XGET
'localhost:9200/test/_analyze?analyzer=synonym&text=aaa+test&pretty=true'

And you should get something like this:
{
"tokens" : [ {
"token" : "bbb",
"start_offset" : 0,
"end_offset" : 3,
"type" : "SYNONYM",
"position" : 1
}, {
"token" : "test",
"start_offset" : 4,
"end_offset" : 8,
"type" : "word",
"position" : 2
} ]
}

So, as you can see it works. Can you check if it works for you?

*-- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search
Analytics Solr & Elasticsearch Support * *http://sematext.com/

Hi,

I'm trying to install a synonym token filter for an existing index and
having a hard time understanding how this should be done. I've created a
synonym.txt file, but I can't understand how to implement the config
described in the doc:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html.
Is this a file? If so, should it go into the config directory? Or is this
supposed to be PUT via curl? None of the things I've tried so far worked.
Please help!

Thanks a lot,
Alex

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d89026ea-aad1-4537-8dac-8ea18a0c6b13%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9d9c0a4f-75ad-41a5-96a2-479514c1646c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Rafał Kuć) #4

Hello!

You can't update the synonyms on already opened index, first you need to close it. You should also update the analyzer on a field you want to use synonyms and re-index your data if the synonym filter is a part of analysis during indexing. This is because already indexed data won't take synonyms into consideration. So, if you can't delete the index, you should first close it, than update the settings and reopen it. For example like this (I assume I have the test index already created):

  1. Close the index:

curl -XPOST 'http://localhost:9200/test/_close'

  1. Update the settings:

curl -XPUT 'localhost:9200/test/_settings' -d '{

"settings" : {

"analysis" : {

"analyzer" : {

"synonym" : {


 "tokenizer" : "whitespace",


 "filter" : ["synonym"]


}

},

"filter" : {

"synonym" : {


 "type" : "synonym",


 "synonyms_path" : "synonym.txt"


}

}

}

}

}'

  1. Update the mappings (the name is just in case you want to update the analyzer not only on _all field):

curl -XPUT 'localhost:9200/test/doc/_mapping' -d '{

"doc" : {

"_all" : {

"enabled" : true,

"analyzer" : "synonym"

},

"properties" : {

"name" : { "type" : "string", "index" : "analyzed", "analyzer" : "synonym" }

}

}

}'

  1. Open the index:

curl -XPOST 'http://localhost:9200/test/_open'

After that, you should have your filter in the settings. You can check it by running:

curl -XGET 'http://localhost:9200/test/_settings?pretty'

curl -XGET 'http://localhost:9200/test/_mapping?pretty'

Now to test it, just index a new document:

curl -XPOST 'localhost:9200/test/doc/1' -d '{"name":"aaa test"}'

And now test the search:

curl -XGET 'localhost:9200/test/_search?pretty' -d '{

"query" : {

"match" : {

"_all" : "bbb"

}

}

}'

And it should be working:

{

"took" : 1,

"timed_out" : false,

"_shards" : {

"total" : 5,


"successful" : 5,


"failed" : 0

},

"hits" : {

"total" : 1,


"max_score" : 0.625,


"hits" : [ {


  "_index" : "test",


  "_type" : "doc",


  "_id" : "1",


  "_score" : 0.625, "_source" : {"name":"aaa test"}


} ]

}

}

However, remember about data re-indexing :slight_smile:

--

Regards,

Rafał Kuć

Performance Monitoring * Log Analytics * Search Analytics

Solr & Elasticsearch Support * http://sematext.com/

Rafal, thanks for a quick reply! I think I already understood how to do this for a new index. The issue is how do you do this for an existing index? Am I supposed to do smth like this?:

curl -XPOST 'http://localhost:9200/my_twitter_river/settings/' -d '

{

  "analysis" : {


     "filter" : {


        "synonym" : {


           "type" : "synonym",


           "synonyms_path" : "synonym.txt"


        }


     }


  }

}

'

Also, some posts seem to indicate that if I run a query on _all fileds, this won't be taken into account anyway. Is this true?

Thanks!

On Monday, December 30, 2013 1:51:59 PM UTC+1, Rafał Kuć wrote:

Hello!

This is a part of the mappings you send to Elasticsearch, for example during index creation. The synonyms_path property is relative to the config directory. So if your file is synonym.txt, it should go to $ES_HOME/config and you could send the following command to create an index:

curl -XPOST 'localhost:9200/test' -d '

{

"settings": {

"index" : {

"analysis" : {

"analyzer" : {

"synonym" : {


 "tokenizer" : "whitespace",


 "filter" : ["synonym"]


}

},

"filter" : {

"synonym" : {


 "type" : "synonym",


 "synonyms_path" : "synonym.txt"


}

}

}

}

},

"mappings" : {

"test" : {

"properties" : {

"name" : { "type" : "string", "index" : "analyzed", "analyzer" : "synonym" }

}

}

}

}'

My synonym.txt file had the following contents:

aaa=>bbb

Now to test it, just run the following command:

curl -XGET 'localhost:9200/test/_analyze?analyzer=synonym&text=aaa+test&pretty=true'

And you should get something like this:

{

"tokens" : [ {

"token" : "bbb",

"start_offset" : 0,

"end_offset" : 3,

"type" : "SYNONYM",

"position" : 1

}, {

"token" : "test",

"start_offset" : 4,

"end_offset" : 8,

"type" : "word",

"position" : 2

} ]

}

So, as you can see it works. Can you check if it works for you?

--

Regards,

Rafał Kuć

Performance Monitoring * Log Analytics * Search Analytics

Solr & Elasticsearch Support * http://sematext.com/

Hi,

I'm trying to install a synonym token filter for an existing index and having a hard time understanding how this should be done. I've created a synonym.txt file, but I can't understand how to implement the config described in the doc: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html. Is this a file? If so, should it go into the config directory? Or is this supposed to be PUT via curl? None of the things I've tried so far worked. Please help!

Thanks a lot,

Alex

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d89026ea-aad1-4537-8dac-8ea18a0c6b13%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9d9c0a4f-75ad-41a5-96a2-479514c1646c%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


(Alexander P.) #5

Great, got it! Thanks a lot for your help!

On Monday, December 30, 2013 3:23:34 PM UTC+1, Rafał Kuć wrote:

Hello!

You can't update the synonyms on already opened index, first you need to
close it. You should also update the analyzer on a field you want to use
synonyms and re-index your data if the synonym filter is a part of analysis
during indexing. This is because already indexed data won't take synonyms
into consideration. So, if you can't delete the index, you should first
close it, than update the settings and reopen it. For example like this (I
assume I have the test index already created):

  1. Close the index:
    curl -XPOST 'http://localhost:9200/test/_close'

  2. Update the settings:
    curl -XPUT 'localhost:9200/test/_settings' -d '{
    "settings" : {
    "analysis" : {
    "analyzer" : {
    "synonym" : {
    "tokenizer" : "whitespace",
    "filter" : ["synonym"]
    }
    },
    "filter" : {
    "synonym" : {
    "type" : "synonym",
    "synonyms_path" : "synonym.txt"
    }
    }
    }
    }
    }'

  3. Update the mappings (the name is just in case you want to update the
    analyzer not only on _all field):
    curl -XPUT 'localhost:9200/test/doc/_mapping' -d '{
    "doc" : {
    "_all" : {
    "enabled" : true,
    "analyzer" : "synonym"
    },
    "properties" : {
    "name" : { "type" : "string", "index" : "analyzed", "analyzer" :
    "synonym" }
    }
    }
    }'

  4. Open the index:
    curl -XPOST 'http://localhost:9200/test/_open'

After that, you should have your filter in the settings. You can check it
by running:

curl -XGET 'http://localhost:9200/test/_settings?pretty'
curl -XGET 'http://localhost:9200/test/_mapping?pretty'

Now to test it, just index a new document:

curl -XPOST 'localhost:9200/test/doc/1' -d '{"name":"aaa test"}'

And now test the search:

curl -XGET 'localhost:9200/test/_search?pretty' -d '{
"query" : {
"match" : {
"_all" : "bbb"
}
}
}'

And it should be working:

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.625,
"hits" : [ {
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_score" : 0.625, "_source" : {"name":"aaa test"}
} ]
}
}

However, remember about data re-indexing :slight_smile:

*-- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search
Analytics Solr & Elasticsearch Support * *http://sematext.com/

Rafal, thanks for a quick reply! I think I already understood how to do
this for a new index. The issue is how do you do this for an existing
index? Am I supposed to do smth like this?:

curl -XPOST 'http://localhost:9200/my_twitter_river/settings/' -d '
{
"analysis" : {
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "synonym.txt"
}
}
}
}
'

Also, some posts seem to indicate that if I run a query on _all fileds,
this won't be taken into account anyway. Is this true?

Thanks!

On Monday, December 30, 2013 1:51:59 PM UTC+1, Rafał Kuć wrote:
Hello!

This is a part of the mappings you send to Elasticsearch, for example
during index creation. The synonyms_path property is relative to the config
directory. So if your file is synonym.txt, it should go to $ES_HOME/config
and you could send the following command to create an index:

curl -XPOST 'localhost:9200/test' -d '
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "synonym.txt"
}
}
}
}
},
"mappings" : {
"test" : {
"properties" : {
"name" : { "type" : "string", "index" : "analyzed", "analyzer" :
"synonym" }
}
}
}
}'

My synonym.txt file had the following contents:
aaa=>bbb

Now to test it, just run the following command:
curl -XGET
'localhost:9200/test/_analyze?analyzer=synonym&text=aaa+test&pretty=true'

And you should get something like this:
{
"tokens" : [ {
"token" : "bbb",
"start_offset" : 0,
"end_offset" : 3,
"type" : "SYNONYM",
"position" : 1
}, {
"token" : "test",
"start_offset" : 4,
"end_offset" : 8,
"type" : "word",
"position" : 2
} ]
}

So, as you can see it works. Can you check if it works for you?

*-- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search
Analytics Solr & Elasticsearch Support * *http://sematext.com/

Hi,

I'm trying to install a synonym token filter for an existing index and
having a hard time understanding how this should be done. I've created a
synonym.txt file, but I can't understand how to implement the config
described in the doc:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html.
Is this a file? If so, should it go into the config directory? Or is this
supposed to be PUT via curl? None of the things I've tried so far worked.
Please help!

Thanks a lot,
Alex

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d89026ea-aad1-4537-8dac-8ea18a0c6b13%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out. --
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9d9c0a4f-75ad-41a5-96a2-479514c1646c%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/62ba31aa-4511-4a22-a5c8-9301a9ca48e5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6