Custom analyzers and mappings won't work


(CarlJo) #1

I am trying to set up some common analyzers and mappings via Java, but I am having trouble.
The precise JSON I use, you can find below.

Essentially I want to get shingles for a field called "text". I
had this working, but I got all the traditional stop words unigrams on
top when running a facet search: Not so useful. So I tried to throw a
few stop word lists in the mix. Because I am dealing with English and
German web content, I need a German and an English stop work list, plus a
custom one with stuff like "www" and "http" on it.

First question: Can someone see whether the below JSON should work?

On the Java side, I am using the following code to pass the JSON to Elastic Search:

client.admin().indices().prepareClose(INDEX_NAME).execute().actionGet();

Client.admin().indices().prepareUpdateSettings(INDEX_NAME).setSettings(settingString).execute().actionGet();
client.admin().indices().prepareOpen(INDEX_NAME).execute().get();
client.admin().indices().preparePutMapping(INDEX_NAME).setType("default").setSource(mappingString).execute().actionGet();

Second question: Is trhis correct? Also: When should/can I run
this code? After the index was created? Will it still work after some
content has already been added to the index? Do I need to give ES some
time after I issued the above commands? If so, how do I know when it is
ready again?

Many Thanks!

Settings:
{
"analysis":{
"analyzer":{
"analyzer_shingle":{
"tokenizer":"standard",
"filter":["standard", "lowercase"]
},
"title" : {
"type" : "string",
"index": "not_analyzed"
},
"analyzer_shingle_tf":{
"tokenizer":"standard",
"filter":["standard", "lowercase", "filter_english", "filter_german", "filter_www", "filter_stop", "filter_shingle"]
}
},
"filter":{
"filter_shingle":{
"type":"shingle",
"max_shingle_size":5,
"min_shingle_size":2,
"output_unigrams":"true"
},
"filter_stop":{
"type":"stop",
"enable_position_increments":"true"
},
"filter_english":{
"type":"stop",
"stopwords":"english"
},
"filter_german":{
"type":"stop",
"stopwords":"german"
},
"filter_www":{
"type":"stop",
"stopwords_path":"stopwords_www.txt"
}
}
}
}

Mapping:
default : {
"properties" : {
"coordinates" : {
"type" : "geo_point",
}
"text":{
"search_analyzer":"analyzer_shingle_tf",
"index_analyzer":"analyzer_shingle_tf",
"type":"string"
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1387457760.574.YahooMailNeo%40web28802.mail.ir2.yahoo.com.
For more options, visit https://groups.google.com/groups/opt_out.


(CarlJo) #2

I am trying to set up some common analyzers and mappings via Java, but I am having trouble.
The precise JSON I use, you can find below.

Essentially I want to get shingles for a field called "text". I
had this working, but I got all the traditional stop words unigrams on
top when running a facet search: Not so useful. So I tried to throw a
few stop word lists in the mix. Because I am dealing with English and
German web content, I need a German and an English stop work list, plus a
custom one with stuff like "www" and "http" on it.

First question: Can someone see whether the below JSON should work?

On the Java side, I am using the following code to pass the JSON to Elastic Search:

client.admin().indices().prepareClose(INDEX_NAME).execute().actionGet();

Client.admin().indices().prepareUpdateSettings(INDEX_NAME).setSettings(settingString).execute().actionGet();
client.admin().indices().prepareOpen(INDEX_NAME).execute().get();
client.admin().indices().preparePutMapping(INDEX_NAME).setType("default").setSource(mappingString).execute().actionGet();

Second question: Is trhis correct? Also: When should/can I run
this code? After the index was created? Will it still work after some
content has already been added to the index? Do I need to give ES some
time after I issued the above commands? If so, how do I know when it is
ready again?

Many Thanks!

Settings:
{
"analysis":{
"analyzer":{
"analyzer_shingle":{
"tokenizer":"standard",
"filter":["standard", "lowercase"]
},
"title" : {
"type" : "string",
"index": "not_analyzed"
},
"analyzer_shingle_tf":{
"tokenizer":"standard",
"filter":["standard", "lowercase", "filter_english", "filter_german", "filter_www", "filter_stop", "filter_shingle"]
}
},
"filter":{
"filter_shingle":{
"type":"shingle",
"max_shingle_size":5,
"min_shingle_size":2,
"output_unigrams":"true"
},
"filter_stop":{
"type":"stop",
"enable_position_increments":"true"
},
"filter_english":{
"type":"stop",
"stopwords":"english"
},
"filter_german":{
"type":"stop",
"stopwords":"german"
},
"filter_www":{
"type":"stop",
"stopwords_path":"stopwords_www.txt"
}
}
}
}

Mapping:
default : {
"properties" : {
"coordinates" : {
"type" : "geo_point",
}
"text":{
"search_analyzer":"analyzer_shingle_tf",
"index_analyzer":"analyzer_shingle_tf",
"type":"string"
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1387457943.76317.YahooMailNeo%40web28801.mail.ir2.yahoo.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3