How to stream results on a websocket with percolator?


(Sébastien Lorber) #1

Hello

What i would like to do is:

  • User1 and User2 are friend
  • Both have an open websocket
  • User1 post a new document
  • Because of friendship, User2 receives the document within the websocket

I've checked the percolator.

It seems i can use it that way:

  • User2 registers a percolator query to get documents of all its friends
  • User1 index the document and get the percolator query match (i think i
    read we can get percolator results at index time)
  • User1 current indexing sends notifications to its friends open websockets
    so that they receive the information

The matter is that it can be quite expensive to send the notification from
user1 doc indexing process to the user2 websocket handler.
In a distributed webapp, it would require messaging.
I think a tool which handles distributed event processing, like Redis,
would fit for that need.

To avoid using Redis or something else, would it be possible for
ElasticSearch percolator to directly send the notification to the user2
webapp server.
Is it possible that user2 registered percolator query keeps some kind of
connection to elasticsearch and receive results as they get indexed?

Thanks


(Jörg Prante) #2

Hi,

the percolator feature is fascinating, yes. Finding matches need another
query action. The actions _percolator and _percolate are de-coupled, that
is, it is not a publish/subscribe message passing model.

See also the discussion for "change notification" at

As far as I have examined the code of netty HTTP and REST in Elasticsearch,
a publish/subscribe mechanism is really worth experimenting. I'm very
interested in implementing such a beast. And, yes, I agree that Websockets
are a good idea, because they are bi-directional and save resources.
Imagine 10,000 clients waiting on a node for notification events.

The scenario I'd like to explore is a curl websocket client "A" registering
a 'tag' value on a specific index/type (using percolator behind the scenes)
and hangs on waiting for events, while another curl client "B" is indexing.
The change notifications are filtered and pushed to client "A" if they
match until "A" closes the connection.

Best regards,

Jörg

On Friday, July 6, 2012 3:53:26 PM UTC+2, Sébastien Lorber wrote:

Hello

What i would like to do is:

  • User1 and User2 are friend
  • Both have an open websocket
  • User1 post a new document
  • Because of friendship, User2 receives the document within the websocket

I've checked the percolator.

It seems i can use it that way:

  • User2 registers a percolator query to get documents of all its friends
  • User1 index the document and get the percolator query match (i think i
    read we can get percolator results at index time)
  • User1 current indexing sends notifications to its friends open
    websockets so that they receive the information

The matter is that it can be quite expensive to send the notification from
user1 doc indexing process to the user2 websocket handler.
In a distributed webapp, it would require messaging.
I think a tool which handles distributed event processing, like Redis,
would fit for that need.

To avoid using Redis or something else, would it be possible for
ElasticSearch percolator to directly send the notification to the user2
webapp server.
Is it possible that user2 registered percolator query keeps some kind of
connection to elasticsearch and receive results as they get indexed?

Thanks


(Sébastien Lorber) #3

Actually it's the idea but i'll use a Java backend instead of curl.

I guess it's not so easy to implement since ES nodes should send
notifications to each others about what's being indexed. As only one ES
Netty should hold the websocket for one client, on large clusters it also
means that if we have 100 nodes and only 1 client connected that should
receive a notification, the notification should only be sent to the
appropriate node and not the whole cluster.

If you want to implement this i'm ok to help, don't know so much yet about
websockets so it would be the occasion to learn...

2012/7/7 Jörg Prante joergprante@gmail.com

Hi,

the percolator feature is fascinating, yes. Finding matches need another
query action. The actions _percolator and _percolate are de-coupled, that
is, it is not a publish/subscribe message passing model.

See also the discussion for "change notification" at
https://github.com/elasticsearch/elasticsearch/issues/1242

As far as I have examined the code of netty HTTP and REST in
Elasticsearch, a publish/subscribe mechanism is really worth experimenting.
I'm very interested in implementing such a beast. And, yes, I agree that
Websockets are a good idea, because they are bi-directional and save
resources. Imagine 10,000 clients waiting on a node for notification events.

The scenario I'd like to explore is a curl websocket client "A"
registering a 'tag' value on a specific index/type (using percolator behind
the scenes) and hangs on waiting for events, while another curl client "B"
is indexing. The change notifications are filtered and pushed to client "A"
if they match until "A" closes the connection.

Best regards,

Jörg

On Friday, July 6, 2012 3:53:26 PM UTC+2, Sébastien Lorber wrote:

Hello

What i would like to do is:

  • User1 and User2 are friend
  • Both have an open websocket
  • User1 post a new document
  • Because of friendship, User2 receives the document within the websocket

I've checked the percolator.

It seems i can use it that way:

  • User2 registers a percolator query to get documents of all its friends
  • User1 index the document and get the percolator query match (i think i
    read we can get percolator results at index time)
  • User1 current indexing sends notifications to its friends open
    websockets so that they receive the information

The matter is that it can be quite expensive to send the notification
from user1 doc indexing process to the user2 websocket handler.
In a distributed webapp, it would require messaging.
I think a tool which handles distributed event processing, like Redis,
would fit for that need.

To avoid using Redis or something else, would it be possible for
ElasticSearch percolator to directly send the notification to the user2
webapp server.
Is it possible that user2 registered percolator query keeps some kind of
connection to elasticsearch and receive results as they get indexed?

Thanks


(Jörg Prante) #4

Sure, a pubsub architecture distributed across many ES nodes is not
straightforward, it needs at least an internal pubsub index where the
subscribers are identified and registered, together with the node location
where the subscriber lives. Event classes must be designed.
Classes like org.elasticsearch.action.TransportActionNodeProxy show me that
actions can potentially be executed against a specific node, which is very
useful for passing messages from the indexing node directly to the node
where a subscriber waits for notification.
An event buffer, also in the internal pubsub index, will also be useful, in
situations where subscribers do not consume notifications as fast as they
are produced.
Websockets are also new to me but I'm encouraged by the netty examples...
well I will need some time to move forward...

Jörg

On Saturday, July 7, 2012 1:46:50 AM UTC+2, Sébastien Lorber wrote:

Actually it's the idea but i'll use a Java backend instead of curl.

I guess it's not so easy to implement since ES nodes should send
notifications to each others about what's being indexed. As only one ES
Netty should hold the websocket for one client, on large clusters it also
means that if we have 100 nodes and only 1 client connected that should
receive a notification, the notification should only be sent to the
appropriate node and not the whole cluster.

If you want to implement this i'm ok to help, don't know so much yet about
websockets so it would be the occasion to learn...


(Jörg Prante) #5

Note to self: Would be nice to have an API like the Hazelcast pubsub API
http://www.hazelcast.com/documentation.jsp#Topic but just with
Elasticsearch actions under the hood, with REST _publish / _subscribe
endpoints, and indexing extended with setTopic() / _topic field or
something for automatic publishing while indexing.

Jörg


(system) #6