Streaming API for ElasticSearch


(Aravindh S) #1

Is there a streaming api for ES where I can redirect my stream of events to be indexed?


(Jörg Prante) #2

What does your stream of events look like?


(Aravindh S) #3

An event is a json with say 4 or 5 fields. Example:
{"name":"aravindh","action":"login","time":12455643,"price":1000}


(Jörg Prante) #4

ES has bulk indexing, so you can push nearly any amount of JSON key/value objects to it. Seems you are free to implement the push API. If not, there are myriads of solutions on the web to pick, last not least the Elastic stack.


(Nik Everett) #5

We've talked about having a stream-style insert using http 1.1's chunked encoding but not implemented it. The nice thing about bulk is that when the bulk returns you know the translog on each active shard has been fsynced. The nice thing about chunked encoding would be that Elasticsearch could "pull" from the sender when ready.


(Jörg Prante) #6

Chunked transfer encoding is ancient style of HTTP (with the first idea dating back to 1994) and is gone. See https://github.com/http2/http2-spec/issues/586

The actual method with HTTP are data frames, with bidirectional communication, similar to what WebSocket (HTML5) data frames look like. So Netty, or Java 9 HTTP/2 client, is worth a look for implementing bidirectional streaming for bulk indexing in ES. The idea would be to submit a number of JSON objects in a single data frame, and server answers asynchronously if the bulk items reached the translog (or not) on the same persistent socket connection.


(Nik Everett) #7

Sorry, I'm pretty rusty on this stuff. I was thinking of "Transfer-Encoding: chunked" which is pretty widely in use (like on this page) but you can be sure if we ever do actually get back to the steam style we'll reevaluate based on what modern stuff can do. Its not like Elasticsearch would have to support IE6 if it built streams. Curl certainly, but not IE6.


(Jörg Prante) #8

Sure, HTTP chunked transfer is widely in use. No doubt, since 99,99% of all HTTP traffic is still 1.0/1.1, that is the reason :slight_smile:

Curl has full HTTP/2 support, just use --http2 option: https://curl.haxx.se/docs/http2.html


(system) #9