Hey,
I have done some research by now and can provide a good example what the problem is. I thought I had found the pattern when a search works and when not... But I hadn't. My Java-Test with some examples:
@Test
public void testCopyTo() throws ExecutionException, InterruptedException, IOException {
XContentBuilder settings = jsonBuilder()
.startObject()
.startObject("analysis")
.startObject("filter")
.startObject("german_stop")
.field("type", "stop")
.field("stopwords", "_german_")
.endObject()
.startObject("german_keywords")
.field("type", "keyword_marker")
.field("keywords", "[\"der\"]")
.endObject()
.startObject("german_stemmer")
.field("type", "stemmer")
// german, german2, light_german, minimal_german
.field("language", "light_german")
.endObject()
.endObject()
.startObject("analyzer")
.startObject("my_analyzer")
.field("tokenizer", "ngram")
.array("filter", "lowercase", "german_stop", "german_keywords", "german_normalization", "german_stemmer")
.endObject()
.endObject()
.endObject()
.endObject();
XContentBuilder mapping = jsonBuilder()
.startObject()
.startObject("properties")
.startObject("name")
.field("type","text")
.field("copy_to","search")
.endObject()
.startObject("search")
.field("type","text")
.field("analyzer","german")
.startObject("fields")
.startObject("keyword")
.field("type","keyword")
.field("ignore_above",256)
.endObject()
.endObject()
.endObject()
.endObject()
.endObject();
elasticClient.admin().indices().prepareCreate("test_index")
// Doesn't seem to work combined with a *...* search
// .setSettings(settings)
.addMapping("test_type", mapping).execute().get();
Map<String, Object> source = new HashMap<>();
// 3 characters, everything is ok
String name = "Güe"; // =)
// String name = "nne"; // =)
// 4 characters: last character can't be 'e'
// String name = "Güet"; // =)
// String name = "nnen"; // =)
// String name = "Güne"; // =(
// String name = "nnne"; // =(
// 5 characters: very different results...
// String name = "Güneh"; // =)
// String name = "nnnen"; // =(
// String name = "Günte"; // =(
// String name = "nnnne"; // =(
// 6 characters: last two characters can't be 'e'
// String name = "Günehr"; // =)
// String name = "nnnenn"; // =)
// String name = "Günter"; // =(
// String name = "nnnnen"; // =(
// 7 characters: same
// String name = "Günteir"; // =)
// String name = "nnnnenn"; // =)
// String name = "Günther"; // =(
// String name = "nnnnnen"; // =(
// 8 characters: magically, the next to last character can be 'e' again, but "Günthrel" != "nnnnnnen"???
// String name = "Günthrel"; // =)
// String name = "nnnnnnen"; // =(
// String name = "Günthrle"; // =(
// String name = "nnnnnen"; // =(
source.put("name", name);
elasticClient.prepareIndex("test_index", "test_type", "1").setSource(source).setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE).execute().get();
// Everything is OK
// String searchString = name;
// Not entering the last two characters even works with the "=(" commented names
// String searchString = "*" + name.substring(0, name.length() - 2) + "*";
// Both fail with "=(" commented names
// String searchString = "*" + name.substring(0, name.length() - 1) + "*";
String searchString = "*" + name + "*";
SearchResponse response = elasticClient.prepareSearch("test_index").setQuery(queryStringQuery(searchString).defaultField("search")).execute().get();
assertEquals(1, response.getHits().getTotalHits());
}
As I wrote in the code-comments, I thought I found out that it is a combination of the length of a word, the letter "e", but also surrounding your search string with "*" and how many characters you type into the search. But that seems to be not 100% correct. For example "Güneh" (5 letters) can be found but "nnnen" (also 5 letters) can't. It is really, really confusing.
I tried using an "ngram"-tokenizer but that seems to make it even worse when using wildcards.
You can try this out by yourself and see if the names commented with "=)" will work, and the names with "=(" won't. Or you can try out a different searchString and see if the result fits my comments.