Understanding Elasticsearch internals

I am new to ES and am trying to learn it's internals specifically the one's related to scoring and pagination of results. I have ES 2.3.3 running on the IntelliJ IDE and I want to set up break points so I can see the execution flow from the rest command being parsed to the results being returned. Any ideas about how should I go ahead with this or about any specific classes or modules I should concentrate on ?
Also is this the best way to achieve understanding of the ES codebase ?

The 2.x and master branches are quite different because one uses maven and the other gradle. If you are interested in contributing master is the right place to start. If you are trying to debug a thing, whatever branch you want to debug is the way to go. CONTRIBUTING.md should have something about how to load the thing into Eclipse.

In master the best thing to do is gradle run --debug-jvm and connect to the jvm using a debugger on port 8000. The IDE ought to call that a "Remote Java Application", but I'm not familiar with IntelliJ. 8000 should be the default port.

I've forgotten what the best way to get the same thing is in 2.x.

Elasticsearch hands off execution from node to node as it needs to and has listeners for responses. So you can't just trace from start to finish. If you want to trace around I suggest looking at 5.0's "TransportGetTaskAction" and "RestGetTaskAction". I just recently wrote that and it has lots of javadocs that talk about when, where, and why each method is called.

I gave a talk at Elastic{ON} last year about how to navigate these files but it is no substitute for debugging around the code.

2 Likes

are there anymore talks or documentation similar to this one that discuss about the Elasticsearch code and how to contribute ?