Questions about caching and AbstractSearchScript - elasticsearch 2.4.6


(roger) #1

Hi everyone,
i'm trying to implement a plugin that exposes a native script in elasticsearch 2.4.6, however the behaviour i'm seeing is confusing.

My environment:
Elasticsearch 2.4.6
Single shard Index
Doc count 508123

The script is used as a filter, and as a normal filter it discards documents that do not pass a specific condition, pretty vanilla usage.
The script works well, however after the 3rd or 4th time i run the same query, the script receives a very large amount of documents (380372). and does not actually return any document (the response is as if there was nothing that matched).
On subsequent requests i start getting again the normal response and way faster.

I suppose this is elasticsearch caching the request, however i tried to run the same request with ?request-cache=false with no effect.
I tried as well changing the index settings with :
{
"settings": {
"indices.queries.cache.size": "0%"
}
}

with no effect as well.

I would like to understand this behaviour, and possibly work around it or disable it. Since i'm using a single index this operation takes a huge amount of time.

I reduced the code to the minimum possible, to demonstrate this, the plugin bellow filters the results and only returns half of them.
after running it a couple of times the "cache behaviour" kicks in (very visible in the es log).

public class MyNativeScriptPlugin extends Plugin {
  @Override
  public String name() {
    return "my-native-script";
  }
  @Override
  public String description() {
    return "my native script that does something great";
  }
  public void onModule(ScriptModule scriptModule) {
    scriptModule.registerScript("my_script", MyNativeScriptFactory.class);
  }
  public static class MyNativeScriptFactory implements NativeScriptFactory {
    @Override
    public ExecutableScript newScript(@Nullable Map<String, Object> params) {
      return new MyNativeScript();
    }
    @Override
    public boolean needsScores() {
      return false;
    }
  }

  public static class MyNativeScript extends AbstractSearchScript {
    public int count = 0;
    @Override
    public Object run() {
      LOGGER.info("running {}, adding? {}", ++count, (count % 2) == 0 ? "true" : "false");
      return (count % 2) == 0 ? true : false;
    }
  }
  private final static ESLogger LOGGER =
      Slf4jESLoggerFactory.getLogger(MyNativeScriptPlugin.class.getName());
}

Thank you in advance,
BR


(roger) #2

I suppose this is the behaviour i'm seeing:

Elasticsearch caches queries automatically based on usage frequency. If a non-scoring query has been used a few times (dependent on the query type) in the last 256 queries, the query is a candidate for caching. However, not all segments are guaranteed to cache the bitset. Only segments that hold more than 10,000 documents (or 3% of the total documents, whichever is larger) will cache the bitset

source

edit:

So i guess i was just doing it wrong, i added the config to the node in the proper place (elasticsearch.yml) and restarted the service, and caching for this node disappeared.
config added:

indices.queries.cache.size: 0%

So the question now is, how can i do this programatically only for my script filter?


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.