ES Codebase Walkthrough


I was wondering if there are any resources available to help developers get a grasp on the big concepts within the ES codebase. I'm currently trying to refactor a plugin at work to be compatible with ES 5, and I keep running into concepts that are hard to understand by just looking at the code. The specific classes I'm interested in are AbstractQueryBuilder, XContentBuilder, and QueryParser, but I'm not even sure what question I need to ask. So any guidance on where I can go to learn more, or how to think about these concepts would be amazing. Sorry for the somewhat vague post. I'm still trying to wrap my head around the whole ES architecture. Thanks


If you are not sure what question to ask, you can still consider to publish your plugin refactoring effort so far, and the community could have a look at it.

AbstractQueryBuilder is a base class for building ES queries for translating them into Lucene strcutures.

XContentBuilder is a utility for creating JSON/YAML/etc. formatted objects (a wrapper around Jackson)

QueryParser is a helper interface, for parsing ES queries.

Currently, it's difficult for me to build ES 5, because I am using Gradle 2.14 and the ES team blocked Gradle 2.14 from building /examining the code base.

Thanks for the response! I guess what I'd like to know is, given this query:
"match" : {
"message" : {
"query" : "this is a test",
"type" : "phrase"

How does the query get broken down and processed by the various ES components in it's journey to become a Lucene query? Maybe that's still a rather broad question.

You want to know about a REST query it seems?

  • HTTP receives JSON (a "REST action")
  • REST action is parsed to ES Java API, using org.elasticsearch.index.query.QueryBuilder
  • the ES query source is transported to the nodes that hold the shards of the addresses index using a coordinating node (the node that received the search request)
  • from there, the TransportSearchAction processes the search, by default the SearchQueryThenFetchAsyncAction, which divides the search into search phases. "query then fetch" has two phases.
  • first phase: the TransportService executes the query relevance scoring under timeout /circuit breaker control. The result is a set of doc IDs. On each node, a SearchService is running and doing the heavy lifting (parsing the source, query caching, scroll), creating a SearchContext.
  • second phase: the doc source/fields are fetched from the shards and combined with the doc scores
  • the SearchResponse is returned to the coordinating node, which delivers the result to the client

Parsing the source from ES syntax to Lucene is performed by the SearchService as follows:

  • the parts to parse in the source are manifold: from, size, indexBoost, query (with rewrite), postfilter, sorts, profiles, timeout, terminateafter, aggregations, suggest, rescores, stored fields, explain, fetchsource, docvaluefields, highlighter, scriptfields, ext, version, stats, searchafter, slice
  • the QueryShardContext parses the ES Query in toQuery() into Lucene and puts the result in ParsedQuery. It's kind of a recursive descendent parser, translating the query element by element into Lucene equivalents.
1 Like

Gradle 2.14 breaks the build because that version made internal some previously public classes that we use for progress logging during the build. Until we have a solution to this problem, the best approach is to make clear that Gradle 2.14 is not compatible with the build by failing early rather than the build breaking for reasons that might appear mysterious to developers that do not work with the codebase everyday.

@jasontedor I guess that plugin authors can still use Gradle 2.14 if they wish right?

Yes, but the comment that I replied to was specifically in regards to the Elasticsearch codebase and Gradle 2.14.

1 Like

Awesome! I was able to look into the code and somewhat follow along what you laid out. From what I understood, when you are writing a custom query builder, you want your QB to be capable of both creating a ES query from XContent AND creating a Lucene query from a QueryShardContext, correct? I've looked at the built-in QB's, and the strategy for turning xContent into a QB is simply grab each subsequent token of the xContent parser and check to see what kind of field it represents, then fill in the appropriate value of the ES query. For example, the MatchQueryBuilder checks if the current token represents the SLOP_FIELD, and if it does, it sets the slop value equal to the next int value of the parser. But the task of creating a Lucene query from a QueryShardContext seems more complicated. From looking at how the MatchQueryBuilder does it, it appears that there's a fairly tight coupling between the MatchQueryBuilder and the MatchQuery. The former calls the latter, but then the latter creates a new instance of the former and uses that to create a query. From what I understand, the QueryShardContext is mainly used to get at the different analyzers and field mappings, correct?

Thanks for the thorough response above. It definitely helped. Now I'm just curious as to how to mentally think about the task of creating ES queries and Lucene queries from xContent and Shard Contexts.

Correct. Queries have builder classes, following the builder design pattern for fluent programming style.

Yes, at shard level, Elasticsearch works like a Lucene index - it means, ES fields are translated into Lucene fields, ES analyzers are translated into Lucene analyzers etc. All in all, it's a straightforward process (where ES has some add-ons).

Maybe you should have a look into the ES test source codes, they offer a plethora of tricks how to use and understand the internal mechanisms.