I am trying to set up some common analyzers and mappings via Java, but I am having trouble.
The precise JSON I use, you can find below.
Essentially I want to get shingles for a field called "text". I
had this working, but I got all the traditional stop words unigrams on
top when running a facet search: Not so useful. So I tried to throw a
few stop word lists in the mix. Because I am dealing with English and
German web content, I need a German and an English stop work list, plus a
custom one with stuff like "www" and "http" on it.
First question: Can someone see whether the below JSON should work?
On the Java side, I am using the following code to pass the JSON to Elastic Search:
client.admin().indices().prepareClose(INDEX_NAME).execute().actionGet();
Client.admin().indices().prepareUpdateSettings(INDEX_NAME).setSettings(settingString).execute().actionGet();
client.admin().indices().prepareOpen(INDEX_NAME).execute().get();
client.admin().indices().preparePutMapping(INDEX_NAME).setType("default").setSource(mappingString).execute().actionGet();
Second question: Is trhis correct? Also: When should/can I run
this code? After the index was created? Will it still work after some
content has already been added to the index? Do I need to give ES some
time after I issued the above commands? If so, how do I know when it is
ready again?
Many Thanks!
Settings:
{
"analysis":{
"analyzer":{
"analyzer_shingle":{
"tokenizer":"standard",
"filter":["standard", "lowercase"]
},
"title" : {
"type" : "string",
"index": "not_analyzed"
},
"analyzer_shingle_tf":{
"tokenizer":"standard",
"filter":["standard", "lowercase", "filter_english", "filter_german", "filter_www", "filter_stop", "filter_shingle"]
}
},
"filter":{
"filter_shingle":{
"type":"shingle",
"max_shingle_size":5,
"min_shingle_size":2,
"output_unigrams":"true"
},
"filter_stop":{
"type":"stop",
"enable_position_increments":"true"
},
"filter_english":{
"type":"stop",
"stopwords":"english"
},
"filter_german":{
"type":"stop",
"stopwords":"german"
},
"filter_www":{
"type":"stop",
"stopwords_path":"stopwords_www.txt"
}
}
}
}
Mapping:
default : {
"properties" : {
"coordinates" : {
"type" : "geo_point",
}
"text":{
"search_analyzer":"analyzer_shingle_tf",
"index_analyzer":"analyzer_shingle_tf",
"type":"string"
}
}
}
}
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1387457760.574.YahooMailNeo%40web28802.mail.ir2.yahoo.com.
For more options, visit https://groups.google.com/groups/opt_out.