Percolate Performance

Hey Guys,

Another investigative question here -- trying to figure out if
elasticsearch is right for a certain project my team is working on:

Does anyone have any hard metrics on how the percolate feature holds up
against different sized datasets and how it fares with real time query
indexing/percolating. I've seen relatively small
stuff discussed anecdotally, but am wondering if anyone has any specifics.

In our current architecture we have ~500,000 queries stored and percolate
(we call it profile) around 10 documents a minute through it, expecting
responses in under (or around) half a second. This doesn't seem like it
would be too hard of a problem for a properly driven elasticsearch setup,
but we also add/remove about 50 queries/minute from our
index simultaneously while profiling documents. Would ElasticSearch block
in such cases?

Wondering...

-Adam

--

Hi,

Depends on the hardware and query complexity... :slight_smile:

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Wednesday, October 17, 2012 3:59:44 PM UTC-4, Adam Georgiou wrote:

Hey Guys,

Another investigative question here -- trying to figure out if
elasticsearch is right for a certain project my team is working on:

Does anyone have any hard metrics on how the percolate feature holds up
against different sized datasets and how it fares with real time query
indexing/percolating. I've seen relatively small
stuff discussed anecdotally, but am wondering if anyone has any specifics.

In our current architecture we have ~500,000 queries stored and percolate
(we call it profile) around 10 documents a minute through it, expecting
responses in under (or around) half a second. This doesn't seem like it
would be too hard of a problem for a properly driven elasticsearch setup,
but we also add/remove about 50 queries/minute from our
index simultaneously while profiling documents. Would Elasticsearch block
in such cases?

Wondering...

-Adam

--

Hi, I think everybody would be interested in a properly executed and
repeatable benchmark of that...

For what it's worth, in one of my past contracts, we had like ~ 2,500
registered queries, tens of millions of docs, and used percolator for
"alert" use cases (every document being indexed was running through
percolator, not too high indexing rate though), as well as "document
clustering" use case (when exporting, tell me what "topics" this document
is about, run every document being exported against multiple percolator
queries).

It performed very well -- we never noticed any performance issues, it
wasn't slowing down the throughput.

Would Elasticsearch block in such cases?

Certainly not -- after all, percolator queries are just "special" documents
in a "special" index.

Karel

On Wednesday, October 17, 2012 9:59:44 PM UTC+2, Adam Georgiou wrote:

Hey Guys,

Another investigative question here -- trying to figure out if
elasticsearch is right for a certain project my team is working on:

Does anyone have any hard metrics on how the percolate feature holds up
against different sized datasets and how it fares with real time query
indexing/percolating. I've seen relatively small
stuff discussed anecdotally, but am wondering if anyone has any specifics.

In our current architecture we have ~500,000 queries stored and percolate
(we call it profile) around 10 documents a minute through it, expecting
responses in under (or around) half a second. This doesn't seem like it
would be too hard of a problem for a properly driven elasticsearch setup,
but we also add/remove about 50 queries/minute from our
index simultaneously while profiling documents. Would Elasticsearch block
in such cases?

Wondering...

-Adam

--