Plugin to modify search queries


(Alex Roytman) #1

Could someone refer me to samples or API to inject piece of my code that would inspect posted query and enrich it wit additional criteria based on some rules?

thank you
Alex


(Jörg Prante) #2

Something like filter aliases maybe? https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html#filtered These terms are added as a filter to a query that is addressed to the given alias.

I'm not sure why a query should be post processed, are you rewriting "bad" queries? What is "additional criteria"? Another query?

Something on JSON level, or with query language semantics?


(Alex Roytman) #3

Hi Jorg,

Thank you for looking into it. It is access control. We have a rather complicated rules (based on people security groups; teams, peoples and their role on the "Engagement" which the document represents, managerial relationships between people as well as people team affiliations ...
I have an algorithm that based on metadata encodes all the rules as security tokens for each document and then do similar encoding for logged in person. then global filter must be applied on any incoming query to make sure that access control is enforced.

Our elastic is open to javascript UI directly not via our own services layer because I have essentially rather flexible analytics UI which can produce very complicated queries and trying to wrap it into our own service will just reinvent elastics well structured API.

Thus I need to inject this extra criteria on ACL tokens into incoming queries. We are using embedded elastic via transport wares (it is a shame it is being killed as well as embedded elastic) so I have an option of doing it in the servlet before passing it to elastic rest layer but i thought plugin could be better. I would need access to request headers at least to grab user tokens while in plugin though

So I do not think filtered alias would work - unfortunately it is static. It would have been lot more interesting if besides the alias parameters (for its filter) could be passed as part of the URL (or post request) then it would work for me very well r


(Jörg Prante) #4

So I understand a HTTP REST plugin would be sufficient?

You can add an extra endpoint for search (or, with a RestFilter, which can be hooked between HTTP layer and ES API, it should also be possible to override the existing _search endpoint).

In my plugin https://github.com/jprante/elasticsearch-arrayformat you can study how to implement a simple _search endpoint alternative called _search_arrayformat which just uses a different JSON response format but re-uses the existing search.

In org.elasticsearch.rest.action.search.RestSearchAction you can study the methods parseSearchRequest() and parseSearchSource() in order to augment queries when they are translated to Java API.

If HTTP REST is not sufficient, you would have to extend the transport protocol by adding a custom transport action on the Java API layer, which is more advanced. This could a be modified org.elasticsearch.action.search.SearchAction, but the work is more challenging - the search functionality is quite large and distributed over some classes.

I'm not sure if embedded Elasticsearch really will be "killed". In that case I would have to write my own bootstrap code to embed Elasticsearch node in my platform. The "wares" approach has been admittedly a bit cranky because ES never embraced the JEE platform at all e.g. the weird classloader stuff and start/stop lifecycles in JEE containers. But starting an embedded node from Java in general is a very important functionality for 3rd party vendors to create simple platforms with their specific add-ons to Elasticsearch with close-to-zero effort. I'm more afraid embedded Elasticsearch will become a paid product, being sold as JEE version, OSGI version, cloud version, and the like.


(Jörg Prante) #5

For a custom search action on Java API layer, see https://github.com/jprante/elasticsearch-simple-action-plugin which I just updated to Elasticsearch 2.3.4

As a demonstration, It implements a "simple" transport action which re-uses the existing search action code but always executes a match_all query.


(Alex Roytman) #6

Hi Jörg,

Have you made any progress with cleanly embedding 5.x into your platform? I gave it a few tries but it seems to be hard and getting more and more closed with every release. For now I am staying on 2.4.x but I suspect I will have to go plugins route if I want to move to ES5. I wonder if you have any ES5-specific suggestions on how to modify incoming queries (adding extra security based criteria to it) given that ES now has java representation for queries or what you said in your prior posts still the best places to inject my logic. I wonder if Action Filters are allowed to modify requests or just filter them).

Thank you,
alex


(Jörg Prante) #7

Maybe I do not fully understand how you prefer to add the ACL token management with regard to the Elasticsearch indexing and search API.

From what I understand, there are two general approaches:

  • writing a plugin to guard Elasticsearch at server side, monitoring requests, and rewriting them after they have been sent by a client. This approach should be possible when I study projects like https://github.com/floragunncom/search-guard/tree/5.1.1
    For instance, Search Guard exchanges the existing transport layer by a custom layer to check for access https://github.com/floragunncom/search-guard/blob/5.1.1/src/main/java/com/floragunn/searchguard/SearchGuardPlugin.java#L251-L305
    It is fantastic to be able to study this code on Github as open source. Maybe your plans to handle ACL are in the same direction?

  • writing a wrapper around Elasticsearch client API at client side, maybe embedded in a WAR in a JEE server or listening at a custom HTTP port, assuming a) the Elasticsearch cluster can be locked away in a private network and b) all cluster accesses can be controlled exclusively through such a wrapper, operating like a reverse proxy. Using Java is optional, a reverse proxy can be implemented in whatever language.

I have implemented an "extras" wrapper for the Elasticsearch client. See
https://github.com/xbib/elasticsearch-extras-client/tree/es5
for the ES 5.x branch.

You can understand my project as an extended layer for the Elasticsearch client API (it combines search, index management and bulk indexing into one interface) and it is thought to carry my code to the future without the tedious work of managing API incompatibilities. No matter if a node or transport (or even HTTP) client is under it, my applications like JDBC importer or knapsack (or my professional work) should continue to work without code change. I added a simpler implementation of TransportClient to my project, so do not expect it behaves exactly like the mainline version.

It should be straightforward to fork or use my "extras" library from any Java application. The Maven coordinates are available at

http://search.maven.org/#artifactdetails|org.xbib|elasticsearch-extras-client|5.1.1.0|jar

I agree and it's very troubling. That's why I am still on Elasticsearch 2.x

As a side note: reading https://www.elastic.co/blog/state-of-the-official-elasticsearch-java-clients it looks like it is planned to remove Netty transport and action plugin capabilities from Java client completely. If so, it would be impossible to develop custom nodes, custom actions, and custom client plugins. Most users seem to be satisfied by reusing existing binary code by downloading, installing and running it, but that is not enough for me using an open source product.

So while the future of the existing extension points in ES is unclear, the move to 5.x is risky and very challenging for me.


(Alex Roytman) #8

Thank you Jörg,

I would like to expose elastic rest endpoint to client application including browsers directly and for that I need two things - 1) read only (search only) access and 2) ability to inspect and augment queries with extra criteria to implement document level security filtering. Currently I run ES 2.x embedded with servlet transport which I modified to work with 2.4. I also managed to do the same for 5.1.1 but only after registering my own http transport type which does not do anything because I do not want http access to my node only servlet based one. That was necessary because ES 5 would not register REST actions unless http transport is enabled

Thank you for pointing to search-guard I will take a look. I have looked at https://github.com/sscarduzio/elasticsearch-readonlyrest-plugin which is probably similar in purpose and approach but if ES team may remove transport and actions plugins that might be of no use. I hate diving into it only to find out that it is on its way out. While I understand that ES needs to productize and commercialize ES it is very upsetting that they are killing embedding and in general making it a rather closed product. Unlike people who are crunching huge volumes of fairly simple data like logs I deal with fairly small but complex, highly nested and interconnected business data and it is usually application-specific and have fancy access control rules so embedding it makes perfect sense from integration and deployment perspective.

I was going to research and learn action plugins/filtering and/or transport plugins to move my access control logic into elastic so it could run standalone. Do you think ES may remove action/transport plugins capabilities making it impossible to perform access control and augmenting of the incoming queries impossible? If so I guess I'd better stick with 2.4 for now till it is clear where ES is going with it


(Baruch Brutman) #9

Hi Jorg,

We are looking for an elegant way to score results with visibility of the whole result set. I was looking at the elasticsearch-simple-action-plugin as an option to work on the response hits etc.

What would be your advice on the best way to do that?

Thanks allot! Baruch.


(system) #10