Please, tell about the success story about ES usage on production

vsiryi · November 30, 2011, 10:33am

I want to convince my customer to use the ES. To do this I need
success stories about ES usage on production. I tried to find similar
information at the ElasticSearch official site and in Google but not
found.

I'll be very happy if you write how many documents you are indexing
and how many approximate number of request is served daily. Some links
to production application will be very helpful.

Thanks!

Best regards, Vitalii Siryi

Alexander_Reelsen · November 30, 2011, 11:33am

Hi

On Nov 30, 11:33 am, Vitalii Siryi vitalii.si...@gmail.com wrote:

I'll be very happy if you write how many documents you are indexing
and how many approximate number of request is served daily. Some links
to production application will be very helpful.
We are using it as a product search engine, you can see it live at
http://www.lusini.de

Document count is quite low, more than 200k documents, and currently
on one node. We never had performance issues. We are doing lots of
facetting queries with about ten facets and of course some filters -
queries get a bit slower but are still fast enough for us.

In order to get the product data into elasticsearch we have
implemented a river, which pulls every n seconds for updates and is
streaming JSON data, so we do not have to wait until we got all the
data, when downloading big data. We have several thousand updates a
day (more likely ten thousand).

Before elasticsearch we had a lightning fast but unmaintenable self
written solution based on bobo and zoie, which we switched in order to
have a more simpler solution, which is understood well by all
developers.

Of course we do not expose elasticsearch directly to browsers, we have
another component in between, which also can do stuff like redirecting
certain search terms to landing pages.
To be honest, I cannot tell, how much requests are coming in per day,
but I guess it is somewhat below 100k.

Hope this helps. In case of questions, feel free to ask.

--Alexander

James_Cook_3 · November 30, 2011, 3:58pm

http://www.penpalkidsclub.com/

Written using ES as the only form of persistence. Went live 7/2011.

Michael_Sick · November 30, 2011, 4:29pm

James,

Was curious what settings/features you consider most important to
configure/use when using ES without a secondary persistence mechanism.
Perhaps this should be a separate thread - but I'm very curious what your
experiences are here.

Thanks,
--Mike

On Wed, Nov 30, 2011 at 10:58 AM, James Cook jcook@pykl.com wrote:

http://www.penpalkidsclub.com/

Written using ES as the only form of persistence. Went live 7/2011.

Andy_2 · December 1, 2011, 1:24am

Alexander,

What made bobo and zoie unmaintenable? How is elasticsearch more
maintenable?

You said the bobo/zoie solution "lightning fast." Was it significantly
faster than elasticsearch?

Thanks.

On Nov 30, 6:33 am, Alexander Reelsen
alexander.reel...@googlemail.com wrote:

Hi

On Nov 30, 11:33 am, Vitalii Siryi vitalii.si...@gmail.com wrote:> I'll be very happy if you write how many documents you are indexing

and how many approximate number of request is served daily. Some links
to production application will be very helpful.

We are using it as a product search engine, you can see it live athttp://www.lusini.de

Document count is quite low, more than 200k documents, and currently
on one node. We never had performance issues. We are doing lots of
facetting queries with about ten facets and of course some filters -
queries get a bit slower but are still fast enough for us.

In order to get the product data into elasticsearch we have
implemented a river, which pulls every n seconds for updates and is
streaming JSON data, so we do not have to wait until we got all the
data, when downloading big data. We have several thousand updates a
day (more likely ten thousand).

Before elasticsearch we had a lightning fast but unmaintenable self
written solution based on bobo and zoie, which we switched in order to
have a more simpler solution, which is understood well by all
developers.

Of course we do not expose elasticsearch directly to browsers, we have
another component in between, which also can do stuff like redirecting
certain search terms to landing pages.
To be honest, I cannot tell, how much requests are coming in per day,
but I guess it is somewhat below 100k.

Hope this helps. In case of questions, feel free to ask.

--Alexander

Alexander_Reelsen · December 1, 2011, 7:58am

Hi

On Dec 1, 2:24 am, Andy selforgani...@gmail.com wrote:

What made bobo and zoie unmaintenable? How is elasticsearch more
maintenable?
The bobo/zoie implementation was developed by some JEE keen dev, which
meant it had tons of layers and a bad implementation in terms of
accessibility via HTTP - one servlet, one API call where everything
from facetted queries up to suggest was done by appending gazillions
of parameters.

I do not want to flame or disregard zoie or bobo here, they are good
tools, it was really our implementation which made us switch to
ES.

The good part for us is, that we do not have to care that much about
the product - we only hacked a river on top of it. Much less code to
care about for us. Makes it more maintainable after all and every
developer in the team understands our search solution without digging
into lucene/bobo/zoie internals.

You said the bobo/zoie solution "lightning fast." Was it significantly
faster than elasticsearch?
Facetting is real fast with lots of data when using bobo. However as
we do not have that much data in one index, we dont care. We are more
than happy with ES speed.

--Alexander

Andy_2 · December 1, 2011, 8:51am

I see.

Did you look at Sensei? It's a search engine built using bobo and
zoie. Just wondered if Sensei is easier to use.

On Dec 1, 2:58 am, Alexander Reelsen
alexander.reel...@googlemail.com wrote:

Hi

On Dec 1, 2:24 am, Andy selforgani...@gmail.com wrote:> What made bobo and zoie unmaintenable? How is elasticsearch more

maintenable?

The bobo/zoie implementation was developed by some JEE keen dev, which
meant it had tons of layers and a bad implementation in terms of
accessibility via HTTP - one servlet, one API call where everything
from facetted queries up to suggest was done by appending gazillions
of parameters.

I do not want to flame or disregard zoie or bobo here, they are good
tools, it was really our implementation which made us switch to
ES.

The good part for us is, that we do not have to care that much about
the product - we only hacked a river on top of it. Much less code to
care about for us. Makes it more maintainable after all and every
developer in the team understands our search solution without digging
into lucene/bobo/zoie internals.

You said the bobo/zoie solution "lightning fast." Was it significantly
faster than elasticsearch?

Facetting is real fast with lots of data when using bobo. However as
we do not have that much data in one index, we dont care. We are more
than happy with ES speed.

--Alexander

Alexander_Reelsen · December 1, 2011, 9:02am

Hey

On Dec 1, 9:51 am, Andy selforgani...@gmail.com wrote:

I see.

Did you look at Sensei? It's a search engine built using bobo and
zoie. Just wondered if Sensei is easier to use.
I didnt know about it, when we started investigating elasticsearch.

Also, it looks more complex as you have to do more manual tasks to get
it up and running (i.e. zookeeper). Most fatal, I read the term
"schema" several times in the documentation, did not like that

--Alexander

James_Cook_3 · December 2, 2011, 5:43am

Hi Michael,

My biggest worries are:

Backup/Restore
Split Brain (really, this is my number one concern. Very destructive
and almost no way to recover.)

Take a look at these threads:

-- jim

Michael_Sick · December 2, 2011, 4:08pm

James,

I'm with you on the Backup/Restore. Shay has indicated that it's a high
priority. We need it to satisfy enterprisy type customers that always want
a stable offsite backup. It would also be a great way to manage
pushing/pulling time based indexes from a cluster. We're likely to have an
index/day and would like to roll them off the back-end after N days. When a
user wants to see data past the N day threshold, it would be nice to simply
request the daily file from the tape backup system and import it back into
the system. We can accomplish the same thing with exports of the _source
field or even of the original document (XML in our case and we will likely
backup both) but having indexes at the ready would be very slick.

Not sure I understand the split brain issue but I'll doing some reading up.
--Mike

On Fri, Dec 2, 2011 at 12:43 AM, James Cook jcook@pykl.com wrote:

Hi Michael,

My biggest worries are:

Backup/Restore

Split Brain (really, this is my number one concern. Very destructive
and almost no way to recover.)

Take a look at these threads:

http://goo.gl/xIXkj

Redirecting to Google Groups

-- jim

drewr · December 2, 2011, 6:42pm

James Cook wrote:

Split Brain (really, this is my number one concern. Very destructive
and almost no way to recover.)

I suggest you try ZooKeeper discovery. It should make split-brain
difficult to encounter.

github.com/elastic/elasticsearch

Add initial implementaion of zookeeper-based discovery service.

elastic:master ← imotov:zookeeper-discovery

opened 12:50AM - 23 Jun 11 UTC

imotov

+2748 -0

This is an initial implementation of a ZooKeeper-based discovery plugin. Usage:… - Download ZooKeeper 3.3.3 from http://zookeeper.apache.org/releases.html - Unzip the ZooKeeper archive into a directory, rename conf/zoo-sample.cnf into zoo.cnf and modify dataDir= line to point to a directory on your machine. Start ZooKeeper by running bin/zkServer.sh start. - Install ZooKeeper plugin to ES - Assuming that you are running ZooKeeper on the port 2181 (default), add the following lines to config/elasticsearch.yml file ``` zookeeper: enabled: true host: localhost:2181 discovery: type: zoo_keeper ``` - Start ES

-Drew

Karussell1 · December 2, 2011, 8:52pm

On 2 Dez., 17:08, Michael Sick michael.s...@serenesoftware.com
wrote:

James,

I'm with you on the Backup/Restore. Shay has indicated that it's a high
priority. We need it to satisfy enterprisy type customers that always want
a stable offsite backup. It would also be a great way to manage
pushing/pulling time based indexes from a cluster. We're likely to have an
index/day and would like to roll them off the back-end after N days.

Rolling off means in your case delete from disc or avoid searching on
them?

Here is some code to do rolling indices:

github.com/elastic/elasticsearch

Convenient rolling index method

opened 09:24PM - 25 Nov 11 UTC

closed 03:51PM - 26 May 14 UTC

karussell

Here is some code where a rolling index pattern is implemented. Imagine you have… a logical index named 'tweets', now you want to create every day a new index to keep the indices small (Its a better scalable 'sharding', but only if you have time dependent data). Now, in the proposed code you will have to call rollIndex(maximumIndices) once a day. Then the new indices are all 'tagged' with a 'tweets_roll' alias (for later retrieval -> imrovable?), there is a group of indices for searching (tweets_search) and feeding (tweets_feed). Per default it creates a search alias on all indices and a feed alias only for the very latest. It separates the search and the roll alias as it could be the case that one wants to keep some older indices but do not want to search on it. What do you think? Its rather simple but it works - a simple test below the code. ``` private static final String simpleDateString = "yyyy-MM-dd-HH-mm-ss"; public String rollIndex(int maxRollIndices) { return rollIndex(getIndexName(), maxRollIndices, maxRollIndices); } public String rollIndex(String indexName, int maxRollIndices, int maxSearchIndices) { String rollAlias = indexName + "_roll"; SimpleDateFormat formatter = new SimpleDateFormat(simpleDateString); if (maxRollIndices < 1 || maxSearchIndices < 1) throw new RuntimeException("remaining indices, search indices and feeding indices must be at least 1"); // get old aliases Map<String, AliasMetaData> allRollingAliases = getAliases(rollAlias); // always create new index and append aliases String searchAlias = getSearchIndexName(); String feedAlias = getFeedIndexName(); String newIndexName = indexName + "_" + formatter.format(new Date()); createIndex(newIndexName); addAlias(newIndexName, searchAlias); addAlias(newIndexName, rollAlias); String oldFeedIndexName = null; if (allRollingAliases.isEmpty()) { // do nothing for now } else { TreeMap<Long, String> sortedIndices = new TreeMap<Long, String>(reverseSorter); String[] concreteIndices = getConcreteIndices(allRollingAliases.keySet()); //logger.info("aliases:" + allRollingAliases + ", indices:" + Arrays.toString(concreteIndices)); for (String index : concreteIndices) { int pos = index.indexOf("_"); if (pos < 0) throw new IllegalStateException("index " + index + " is not in the format " + simpleDateString); String indexDateStr = index.substring(pos + 1); Long timeLong; try { timeLong = formatter.parse(indexDateStr).getTime(); } catch (Exception ex) { throw new IllegalStateException("index " + index + " is not in the format " + simpleDateString + " error:" + ex.getMessage()); } String old = sortedIndices.put(timeLong, index); if (old != null) throw new IllegalStateException("indices with the identical date are not supported " + old + " vs. " + index); } int counter = 1; Iterator<String> indexIter = sortedIndices.values().iterator(); while (indexIter.hasNext()) { String currentIndexName = indexIter.next(); if (counter >= maxRollIndices) { deleteIndex(currentIndexName); // delete all the older indices continue; } if (counter == 1) oldFeedIndexName = currentIndexName; if (counter >= maxSearchIndices) removeAlias(currentIndexName, searchAlias); counter++; } } if(oldFeedIndexName != null) moveAlias(oldFeedIndexName, newIndexName, feedAlias); else addAlias(newIndexName, feedAlias); return newIndexName; } public String getSearchIndexName() { return getIndexName() + "_search"; } public String getFeedIndexName() { return getIndexName() + "_feed"; } public void createIndex(String indexName) { client.admin().indices().create(new CreateIndexRequest(indexName).settings(createIndexSettings())).actionGet(timeout); } public XContentBuilder createIndexSettings() { if (createIndexSettings == null) { try { createIndexSettings = JsonXContent.contentBuilder().startObject(). field("index.number_of_shards", createIndexShards). field("index.number_of_replicas", createIndexReplicas). field("index.refresh_interval", "10s"). field("index.merge.policy.merge_factor", 10).endObject(); } catch (IOException ex) { throw new RuntimeException(ex); } } return createIndexSettings; } public void deleteIndex(String indexName) { client.admin().indices().delete(new DeleteIndexRequest(indexName)).actionGet(); } public void addAlias(String indexName, String alias) { client.admin().indices().aliases(new IndicesAliasesRequest().addAlias(indexName, alias)).actionGet(); } public void removeAlias(String indexName, String alias) { client.admin().indices().aliases(new IndicesAliasesRequest().removeAlias(indexName, alias)).actionGet(); } public void moveAlias(String oldIndexName, String newIndexName, String alias) { client.admin().indices().aliases(new IndicesAliasesRequest().addAlias(newIndexName, alias). removeAlias(oldIndexName, alias)).actionGet(); } public Map<String, AliasMetaData> getAliases(String index) { Map<String, AliasMetaData> md = client.admin().cluster().state(new ClusterStateRequest()). actionGet().getState().getMetaData().aliases().get(index); if (md == null) return Collections.emptyMap(); return md; } private static Comparator<Long> reverseSorter = new Comparator<Long>() { @Override public int compare(Long o1, Long o2) { return -o1.compareTo(o2); } }; public String[] getConcreteIndices(Set<String> set) { return client.admin().cluster().state(new ClusterStateRequest()).actionGet().getState(). getMetaData().concreteIndices(set.toArray(new String[set.size()])); } ``` TEST ``` @Test public void rollingIndex() throws Exception { search.setClient(createTestClient()); search.setIndexName("tweets"); String rollIndexTag = search.getIndexName() + "_roll"; String searchIndex = search.getIndexName() + "_search"; String feedIndex = search.getIndexName() + "_feed"; search.rollIndex(4); assertEquals(1, search.getAliases(rollIndexTag).size()); assertEquals(1, search.getAliases(searchIndex).size()); assertEquals(1, search.getAliases(feedIndex).size()); Thread.sleep(1000); search.rollIndex(4); assertEquals(2, search.getAliases(rollIndexTag).size()); assertEquals(2, search.getAliases(searchIndex).size()); assertEquals(1, search.getAliases(feedIndex).size()); Thread.sleep(1000); search.rollIndex(4); assertEquals(3, search.getAliases(rollIndexTag).size()); assertEquals(3, search.getAliases(searchIndex).size()); assertEquals(1, search.getAliases(feedIndex).size()); Thread.sleep(1000); search.rollIndex(4); assertEquals(4, search.getAliases(rollIndexTag).size()); assertEquals(4, search.getAliases(searchIndex).size()); assertEquals(1, search.getAliases(feedIndex).size()); Thread.sleep(1000); search.rollIndex(4); assertEquals(4, search.getAliases(rollIndexTag).size()); assertEquals(4, search.getAliases(searchIndex).size()); assertEquals(1, search.getAliases(feedIndex).size()); Thread.sleep(1000); search.rollIndex(search.getIndexName(), 4, 3); assertEquals(4, search.getAliases(rollIndexTag).size()); assertEquals(3, search.getAliases(searchIndex).size()); assertEquals(1, search.getAliases(feedIndex).size()); } ```

Then after flushing it even should be safe to rsync them into another
location + get them back.

Regards,
Peter.

Michael_Sick · December 2, 2011, 11:36pm

Peter,

Yes, rolling off means that the index for a given day has become older than
our current online window and is eligible for archiving on tape or another
remote location not available to the cluster. So say we're keeping daily
indexes for 100 days, on day 101 for an index it can be backed up and sent
to tape.

Thanks for the pointer, a few questions:

Are you using Index Templates with this method?
After an index is flushed (and even closed), from where do we reliably
copy it and make sure we got all of the needed parts/shards.
Are you using this in a production system? Just curious how it's working
out.

Thanks for the response! --Mike

On Fri, Dec 2, 2011 at 3:52 PM, Karussell tableyourtime@googlemail.comwrote:

On 2 Dez., 17:08, Michael Sick michael.s...@serenesoftware.com
wrote:

James,

I'm with you on the Backup/Restore. Shay has indicated that it's a high
priority. We need it to satisfy enterprisy type customers that always
want
a stable offsite backup. It would also be a great way to manage
pushing/pulling time based indexes from a cluster. We're likely to have
an
index/day and would like to roll them off the back-end after N days.

Rolling off means in your case delete from disc or avoid searching on
them?

Here is some code to do rolling indices:
Convenient rolling index method · Issue #1500 · elastic/elasticsearch · GitHub

Then after flushing it even should be safe to rsync them into another
location + get them back.

Regards,
Peter.

AEvar_Arnfjord_Bjarm · December 3, 2011, 2:12pm

On Fri, Dec 2, 2011 at 17:08, Michael Sick
michael.sick@serenesoftware.com wrote:

I'm with you on the Backup/Restore. Shay has indicated that it's a high
priority. We need it to satisfy enterprisy type customers that always want a
stable offsite backup.

I setup a production setup of ES that does tens of millions of queries
per day. And I solve this by not having ES be the primary datastore
for anything, it's just treated as a specialized index.

I.e. the primary datastore is data scattered through various RDMSs,
then I have a cronjob that does daily aggregations of all that data
into a flat daily rotating table that'll become the Elasticsearch
index.

Then to populate the index I effectively do a SELECT * from that table
and inject into a new daily ES index via the bulk api.

This means that:

In an organization that's used to managing production data via
RDMSs there's no new store of production data, just a specialized
index.
The ES index can be nuked at any time and we can resume search
operations in the time it would take to run that SELECT * > ES
cronjob. Currently that's around 10 minutes.
We don't have to set up anything new to backup / manage the
data. E.g. we have a regular snapshots of production data that are
moved to dev environments. The snapshot just copies the RDMSs, and
then a cronjob in the dev environment populates the dev
Elasticsearch index (which'll by definition by equivalent to
production).

Now in my case the Elasticsearch dataset isn't that large (it
comfortably fits in RAM on one machine), and I only generate new
indexes daily, but I don't see any inherent reason for why this
strategy couldn't be adapted for larger data / data that's changing
all the time.

Setting it up like this did a lot to alleviate concerns about
introducing new technology in my organization.

James_Cook_3 · December 5, 2011, 4:05am

I've read the pull request, but I have no experience with ZooKeeper.

ZooKeeper uses a fixed list of ZooKeeper nodes, so it’s quite easy for it

to decide if quorum is present or not.

Does this comment mean I have to have a few nodes dedicated to just running
zookeeper, or does it mean my application nodes are fixed? Because I have
no fixed nodes. Amazon manages my instances for me and its services will
create new nodes when demand is high, and destroy nodes when demand
lessens. I don't know the IPs of these nodes, nor do they hard disks (EBS
on AWS).

kimchy · December 5, 2011, 7:28pm

Note that with zookeeper you still have split brains, you just get to a
state of no availability when it happens (as far as I know). You can get to
similar behavior with the minimum_master_nodes setting in elasticsearch
discovery (thats not to say that a zookeeper discovery module is not cool).

On Mon, Dec 5, 2011 at 6:05 AM, James Cook jcook@pykl.com wrote:

I've read the pull request, but I have no experience with ZooKeeper.

ZooKeeper uses a fixed list of ZooKeeper nodes, so it’s quite easy for it

to decide if quorum is present or not.

Does this comment mean I have to have a few nodes dedicated to just
running zookeeper, or does it mean my application nodes are fixed? Because I have
no fixed nodes. Amazon manages my instances for me and its services will
create new nodes when demand is high, and destroy nodes when demand
lessens. I don't know the IPs of these nodes, nor do they hard disks (EBS
on AWS).

James_Cook_3 · December 6, 2011, 7:45pm

Can you get to "no availability" using minimum_master_nodes when you have a
totally dynamic collection of nodes? (I don't know how many will be
created/destroyed by external manager to handle load.)

ppearcy · December 9, 2011, 5:58pm

A little late on this thread, but figured I'd share my experience. We
were able to replace two enterprise search systems. One was a legacy
that we wrote a emulation layer on top of to act as a drop in
replacement. The other search system was costing way too much money
and the level of support for issues I ran into was very poor, even
with harassing people on a daily basis, and the performance wasn't
that good, after jumping through some hoops on my side to optimize.

We compared elasticsearch to solr back in fall of 2010 and at that
time elasticsearch had many compelling features that differentiated it
from Solr. Without tuning anything, elasticsearch was 10x faster. I
actually assumed by tests were broken. Now, I could probably have
gotten solr to the same performance level, but why go through the
effort?

In summary:

elasticsearch saved my company probaly 50K / year
It improved performance from the systems I replaced by 10x
Enabled lots of new features we didn't previously had
Shay and others on the discussion groups provide a great level of
support.
Scales horizontally... just throw new servers into the cluster to
add capacity

We've had a couple of hiccups around network partitions. Early
versions could nuke some data. 0.16 fixed most of these issues, but we
still had a few indices corrupted on this release after a major
network event.

Best Regards,
Paul

On Nov 30, 3:33 am, Vitalii Siryi vitalii.si...@gmail.com wrote:

I want to convince my customer to use theES. To do this I needsuccessstories aboutESusageonproduction. I tried to find similar
information at the Elasticsearch official site and in Google but not
found.

I'll be very happy if you write how many documents you are indexing
and how many approximate number of request is served daily. Some links
toproductionapplication will be very helpful.

Thanks!

Best regards, Vitalii Siryi

Topic		Replies	Views
Current or planned usage in production Elasticsearch	7	395	July 6, 2017
Elasticsearch as a atabase Elasticsearch	10	398	July 6, 2017
ES indexing throughput and scalability Elasticsearch	7	1062	July 6, 2017
elasticSearch as a document database Elasticsearch	16	1451	July 6, 2017
Who uses ES in production? Elasticsearch	13	598	July 6, 2017

Please, tell about the success story about ES usage on production

Related topics