Please, tell about the success story about ES usage on production


(vsiryi) #1

I want to convince my customer to use the ES. To do this I need
success stories about ES usage on production. I tried to find similar
information at the ElasticSearch official site and in Google but not
found.

I'll be very happy if you write how many documents you are indexing
and how many approximate number of request is served daily. Some links
to production application will be very helpful.

Thanks!

Best regards, Vitalii Siryi


(Alexander Reelsen) #2

Hi

On Nov 30, 11:33 am, Vitalii Siryi vitalii.si...@gmail.com wrote:

I'll be very happy if you write how many documents you are indexing
and how many approximate number of request is served daily. Some links
to production application will be very helpful.
We are using it as a product search engine, you can see it live at
http://www.lusini.de

Document count is quite low, more than 200k documents, and currently
on one node. We never had performance issues. We are doing lots of
facetting queries with about ten facets and of course some filters -
queries get a bit slower but are still fast enough for us.

In order to get the product data into elasticsearch we have
implemented a river, which pulls every n seconds for updates and is
streaming JSON data, so we do not have to wait until we got all the
data, when downloading big data. We have several thousand updates a
day (more likely ten thousand).

Before elasticsearch we had a lightning fast but unmaintenable self
written solution based on bobo and zoie, which we switched in order to
have a more simpler solution, which is understood well by all
developers.

Of course we do not expose elasticsearch directly to browsers, we have
another component in between, which also can do stuff like redirecting
certain search terms to landing pages.
To be honest, I cannot tell, how much requests are coming in per day,
but I guess it is somewhat below 100k.

Hope this helps. In case of questions, feel free to ask.

--Alexander


(James Cook-3) #3

http://www.penpalkidsclub.com/

Written using ES as the only form of persistence. Went live 7/2011.


(Michael Sick) #4

James,

Was curious what settings/features you consider most important to
configure/use when using ES without a secondary persistence mechanism.
Perhaps this should be a separate thread - but I'm very curious what your
experiences are here.

Thanks,
--Mike

On Wed, Nov 30, 2011 at 10:58 AM, James Cook jcook@pykl.com wrote:

http://www.penpalkidsclub.com/

Written using ES as the only form of persistence. Went live 7/2011.


(Andy-2) #5

Alexander,

What made bobo and zoie unmaintenable? How is elasticsearch more
maintenable?

You said the bobo/zoie solution "lightning fast." Was it significantly
faster than elasticsearch?

Thanks.

On Nov 30, 6:33 am, Alexander Reelsen
alexander.reel...@googlemail.com wrote:

Hi

On Nov 30, 11:33 am, Vitalii Siryi vitalii.si...@gmail.com wrote:> I'll be very happy if you write how many documents you are indexing

and how many approximate number of request is served daily. Some links
to production application will be very helpful.

We are using it as a product search engine, you can see it live athttp://www.lusini.de

Document count is quite low, more than 200k documents, and currently
on one node. We never had performance issues. We are doing lots of
facetting queries with about ten facets and of course some filters -
queries get a bit slower but are still fast enough for us.

In order to get the product data into elasticsearch we have
implemented a river, which pulls every n seconds for updates and is
streaming JSON data, so we do not have to wait until we got all the
data, when downloading big data. We have several thousand updates a
day (more likely ten thousand).

Before elasticsearch we had a lightning fast but unmaintenable self
written solution based on bobo and zoie, which we switched in order to
have a more simpler solution, which is understood well by all
developers.

Of course we do not expose elasticsearch directly to browsers, we have
another component in between, which also can do stuff like redirecting
certain search terms to landing pages.
To be honest, I cannot tell, how much requests are coming in per day,
but I guess it is somewhat below 100k.

Hope this helps. In case of questions, feel free to ask.

--Alexander


(Alexander Reelsen) #6

Hi

On Dec 1, 2:24 am, Andy selforgani...@gmail.com wrote:

What made bobo and zoie unmaintenable? How is elasticsearch more
maintenable?
The bobo/zoie implementation was developed by some JEE keen dev, which
meant it had tons of layers and a bad implementation in terms of
accessibility via HTTP - one servlet, one API call where everything
from facetted queries up to suggest was done by appending gazillions
of parameters.

I do not want to flame or disregard zoie or bobo here, they are good
tools, it was really our implementation which made us switch to
ES. :slight_smile:

The good part for us is, that we do not have to care that much about
the product - we only hacked a river on top of it. Much less code to
care about for us. Makes it more maintainable after all and every
developer in the team understands our search solution without digging
into lucene/bobo/zoie internals.

You said the bobo/zoie solution "lightning fast." Was it significantly
faster than elasticsearch?
Facetting is real fast with lots of data when using bobo. However as
we do not have that much data in one index, we dont care. We are more
than happy with ES speed.

--Alexander


(Andy-2) #7

I see.

Did you look at Sensei? It's a search engine built using bobo and
zoie. Just wondered if Sensei is easier to use.

On Dec 1, 2:58 am, Alexander Reelsen
alexander.reel...@googlemail.com wrote:

Hi

On Dec 1, 2:24 am, Andy selforgani...@gmail.com wrote:> What made bobo and zoie unmaintenable? How is elasticsearch more

maintenable?

The bobo/zoie implementation was developed by some JEE keen dev, which
meant it had tons of layers and a bad implementation in terms of
accessibility via HTTP - one servlet, one API call where everything
from facetted queries up to suggest was done by appending gazillions
of parameters.

I do not want to flame or disregard zoie or bobo here, they are good
tools, it was really our implementation which made us switch to
ES. :slight_smile:

The good part for us is, that we do not have to care that much about
the product - we only hacked a river on top of it. Much less code to
care about for us. Makes it more maintainable after all and every
developer in the team understands our search solution without digging
into lucene/bobo/zoie internals.

You said the bobo/zoie solution "lightning fast." Was it significantly
faster than elasticsearch?

Facetting is real fast with lots of data when using bobo. However as
we do not have that much data in one index, we dont care. We are more
than happy with ES speed.

--Alexander


(Alexander Reelsen) #8

Hey

On Dec 1, 9:51 am, Andy selforgani...@gmail.com wrote:

I see.

Did you look at Sensei? It's a search engine built using bobo and
zoie. Just wondered if Sensei is easier to use.
I didnt know about it, when we started investigating elasticsearch.

Also, it looks more complex as you have to do more manual tasks to get
it up and running (i.e. zookeeper). Most fatal, I read the term
"schema" several times in the documentation, did not like that :slight_smile:

--Alexander


(James Cook-3) #9

Hi Michael,

My biggest worries are:

  • Backup/Restore
  • Split Brain (really, this is my number one concern. Very destructive
    and almost no way to recover.)

Take a look at these threads:

-- jim


(Michael Sick) #10

James,

I'm with you on the Backup/Restore. Shay has indicated that it's a high
priority. We need it to satisfy enterprisy type customers that always want
a stable offsite backup. It would also be a great way to manage
pushing/pulling time based indexes from a cluster. We're likely to have an
index/day and would like to roll them off the back-end after N days. When a
user wants to see data past the N day threshold, it would be nice to simply
request the daily file from the tape backup system and import it back into
the system. We can accomplish the same thing with exports of the _source
field or even of the original document (XML in our case and we will likely
backup both) but having indexes at the ready would be very slick.

Not sure I understand the split brain issue but I'll doing some reading up.
--Mike

On Fri, Dec 2, 2011 at 12:43 AM, James Cook jcook@pykl.com wrote:

Hi Michael,

My biggest worries are:

  • Backup/Restore
  • Split Brain (really, this is my number one concern. Very destructive
    and almost no way to recover.)

Take a look at these threads:

-- jim


(Drew Raines) #11

James Cook wrote:

  • Split Brain (really, this is my number one concern. Very destructive
    and almost no way to recover.)

I suggest you try ZooKeeper discovery. It should make split-brain
difficult to encounter.

-Drew


(Karussell) #12

On 2 Dez., 17:08, Michael Sick michael.s...@serenesoftware.com
wrote:

James,

I'm with you on the Backup/Restore. Shay has indicated that it's a high
priority. We need it to satisfy enterprisy type customers that always want
a stable offsite backup. It would also be a great way to manage
pushing/pulling time based indexes from a cluster. We're likely to have an
index/day and would like to roll them off the back-end after N days.

Rolling off means in your case delete from disc or avoid searching on
them?

Here is some code to do rolling indices:

Then after flushing it even should be safe to rsync them into another
location + get them back.

Regards,
Peter.


(Michael Sick) #13

Peter,

Yes, rolling off means that the index for a given day has become older than
our current online window and is eligible for archiving on tape or another
remote location not available to the cluster. So say we're keeping daily
indexes for 100 days, on day 101 for an index it can be backed up and sent
to tape.

Thanks for the pointer, a few questions:

  1. Are you using Index Templates with this method?
  2. After an index is flushed (and even closed), from where do we reliably
    copy it and make sure we got all of the needed parts/shards.
  3. Are you using this in a production system? Just curious how it's working
    out.

Thanks for the response! --Mike

On Fri, Dec 2, 2011 at 3:52 PM, Karussell tableyourtime@googlemail.comwrote:

On 2 Dez., 17:08, Michael Sick michael.s...@serenesoftware.com
wrote:

James,

I'm with you on the Backup/Restore. Shay has indicated that it's a high
priority. We need it to satisfy enterprisy type customers that always
want
a stable offsite backup. It would also be a great way to manage
pushing/pulling time based indexes from a cluster. We're likely to have
an
index/day and would like to roll them off the back-end after N days.

Rolling off means in your case delete from disc or avoid searching on
them?

Here is some code to do rolling indices:
https://github.com/elasticsearch/elasticsearch/issues/1500

Then after flushing it even should be safe to rsync them into another
location + get them back.

Regards,
Peter.


(Ævar Arnfjörð Bjarmason) #14

On Fri, Dec 2, 2011 at 17:08, Michael Sick
michael.sick@serenesoftware.com wrote:

I'm with you on the Backup/Restore. Shay has indicated that it's a high
priority. We need it to satisfy enterprisy type customers that always want a
stable offsite backup.

I setup a production setup of ES that does tens of millions of queries
per day. And I solve this by not having ES be the primary datastore
for anything, it's just treated as a specialized index.

I.e. the primary datastore is data scattered through various RDMSs,
then I have a cronjob that does daily aggregations of all that data
into a flat daily rotating table that'll become the ElasticSearch
index.

Then to populate the index I effectively do a SELECT * from that table
and inject into a new daily ES index via the bulk api.

This means that:

  • In an organization that's used to managing production data via
    RDMSs there's no new store of production data, just a specialized
    index.

  • The ES index can be nuked at any time and we can resume search
    operations in the time it would take to run that SELECT * > ES
    cronjob. Currently that's around 10 minutes.

  • We don't have to set up anything new to backup / manage the
    data. E.g. we have a regular snapshots of production data that are
    moved to dev environments. The snapshot just copies the RDMSs, and
    then a cronjob in the dev environment populates the dev
    ElasticSearch index (which'll by definition by equivalent to
    production).

Now in my case the ElasticSearch dataset isn't that large (it
comfortably fits in RAM on one machine), and I only generate new
indexes daily, but I don't see any inherent reason for why this
strategy couldn't be adapted for larger data / data that's changing
all the time.

Setting it up like this did a lot to alleviate concerns about
introducing new technology in my organization.


(James Cook-3) #15

I've read the pull request, but I have no experience with ZooKeeper.

ZooKeeper uses a fixed list of ZooKeeper nodes, so it’s quite easy for it

to decide if quorum is present or not.

Does this comment mean I have to have a few nodes dedicated to just running
zookeeper, or does it mean my application nodes are fixed? Because I have
no fixed nodes. Amazon manages my instances for me and its services will
create new nodes when demand is high, and destroy nodes when demand
lessens. I don't know the IPs of these nodes, nor do they hard disks (EBS
on AWS).


(Shay Banon) #16

Note that with zookeeper you still have split brains, you just get to a
state of no availability when it happens (as far as I know). You can get to
similar behavior with the minimum_master_nodes setting in elasticsearch
discovery (thats not to say that a zookeeper discovery module is not cool).

On Mon, Dec 5, 2011 at 6:05 AM, James Cook jcook@pykl.com wrote:

I've read the pull request, but I have no experience with ZooKeeper.

ZooKeeper uses a fixed list of ZooKeeper nodes, so it’s quite easy for it

to decide if quorum is present or not.

Does this comment mean I have to have a few nodes dedicated to just
running zookeeper, or does it mean my application nodes are fixed? Because I have
no fixed nodes. Amazon manages my instances for me and its services will
create new nodes when demand is high, and destroy nodes when demand
lessens. I don't know the IPs of these nodes, nor do they hard disks (EBS
on AWS).


(James Cook-3) #17

Can you get to "no availability" using minimum_master_nodes when you have a
totally dynamic collection of nodes? (I don't know how many will be
created/destroyed by external manager to handle load.)


(ppearcy) #18

A little late on this thread, but figured I'd share my experience. We
were able to replace two enterprise search systems. One was a legacy
that we wrote a emulation layer on top of to act as a drop in
replacement. The other search system was costing way too much money
and the level of support for issues I ran into was very poor, even
with harassing people on a daily basis, and the performance wasn't
that good, after jumping through some hoops on my side to optimize.

We compared elasticsearch to solr back in fall of 2010 and at that
time elasticsearch had many compelling features that differentiated it
from Solr. Without tuning anything, elasticsearch was 10x faster. I
actually assumed by tests were broken. Now, I could probably have
gotten solr to the same performance level, but why go through the
effort?

In summary:

  • elasticsearch saved my company probaly 50K / year
  • It improved performance from the systems I replaced by 10x
  • Enabled lots of new features we didn't previously had
  • Shay and others on the discussion groups provide a great level of
    support.
  • Scales horizontally... just throw new servers into the cluster to
    add capacity

We've had a couple of hiccups around network partitions. Early
versions could nuke some data. 0.16 fixed most of these issues, but we
still had a few indices corrupted on this release after a major
network event.

Best Regards,
Paul

On Nov 30, 3:33 am, Vitalii Siryi vitalii.si...@gmail.com wrote:

I want to convince my customer to use theES. To do this I needsuccessstories aboutESusageonproduction. I tried to find similar
information at the ElasticSearch official site and in Google but not
found.

I'll be very happy if you write how many documents you are indexing
and how many approximate number of request is served daily. Some links
toproductionapplication will be very helpful.

Thanks!

Best regards, Vitalii Siryi


(system) #19