ES 1.6 to ES 5.1.2 upgrade - script scoring


#1

We have started a task on upgrading our ES 1.6 production cluster to ES 5.1.2 and at the same time do some performance improvements on some of our searches that unfortunately relies heavily on script based scoring.

We have a requirement that we should score documents higher when the first words in a matching field is the same as the search input, and some consultants we used earlier seem to have "solved" this by a groovy script that is reused all over our mustache based templates using function scoring based on a quite suspicious script (see below). Our goal is to get rid of most script based scoring to improve performance, and at least convert from groovy to painless to avoid relying on a script engine that might be removed in upcoming ES versions.

Do anyone have a performant solution do to this kind of scoring without the cost of scripts?

The actual script is currently used in a lot of function score queries (match_phrase_prefix, multi_match, fuzzy etc.) on several fields, including stemmed variatons of the fields. In one of our search templates the script is used more than 10 times on various fields (and in combination with other scripts), giving different weight depending on the input field passed to the script. We have one template with more than 30 script functions. Running a wide search with some 100 000 hits cost a lot of cpu in the scoring process and uses several seconds to return, hence this request for some advice.

An alternative that have come to mind without investigating this issue too much is to drop the script and maybe use a prefix query on some specific keyword fields and give matches a proper weight.

    return score(doc[field].value, nameRaw);

    public static double score(String fieldValue, String nameRaw) throws IOException {
    fieldValue = fieldValue ?: "";
    nameRaw = normalizeScandinavianName(nameRaw ?: "");

    final int index = fieldValue.indexOf(nameRaw);
    final String textBefore = index <= 0 ? "" : fieldValue.substring(0, index - 1);

    final double scoreByWordsBeforeName = (double) normWordCount(textBefore) / 100.0d;
    final double scoreByWordsTotal = (double) normWordCount(fieldValue) / 1000.0d;
    final double scoreByLengthTotal = (double) fieldValue.length() / 100000.0d;

    return 1.0 - scoreByWordsBeforeName - scoreByWordsTotal - scoreByLengthTotal;
}

@groovy.transform.CompileStatic
public static String normalizeScandinavianName(String name) throws IOException {
    ScandinavianNormalizationFilter scandinavianFilter = new ScandinavianNormalizationFilter(new KeywordTokenizer(new StringReader(name.toLowerCase())));
    scandinavianFilter.reset();
    scandinavianFilter.incrementToken();
    return scandinavianFilter.getAttribute(CharTermAttribute.class).toString();
}

@groovy.transform.CompileStatic
public static int normWordCount(String s) {
    if (s.isEmpty()) {
        return 0;
    }
    int count = 0;
    for (int i = 0; i < s.length(); i++) {
        if (s.charAt(i) == (char) ' ') {
            count++;
        }
    }
    return count + 1;
}

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.