Usage in production


(David Jensen-2) #1

I was just wondering who out there is using Elastic Search in
production.

Thanks,
David


(Clinton Gormley) #2

On Mon, 2010-06-07 at 14:48 -0700, David Jensen wrote:

I was just wondering who out there is using Elastic Search in
production.

We are - we're the biggest provider of family announcements hosting to
newspapers in Europe:

These are some of our UK sites

Clint

Thanks,
David
--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


(David Jensen-2) #3

Roughly how much data do you manage in your search indexes?

If you're allowed to share that information, of course.

On Jun 7, 3:03 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

On Mon, 2010-06-07 at 14:48 -0700, David Jensen wrote:

I was just wondering who out there is using Elastic Search in
production.

We are - we're the biggest provider of family announcements hosting to
newspapers in Europe:

These are some of our UK sites

Clint

Thanks,
David

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


(Clinton Gormley) #4

On Mon, 2010-06-07 at 18:11 -0700, David Jensen wrote:

Roughly how much data do you manage in your search indexes?

We've got about 6 million docs indexed, average size about 0.5kB -
anywhere from 8 to 20 fields per doc

clint

If you're allowed to share that information, of course.

On Jun 7, 3:03 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

On Mon, 2010-06-07 at 14:48 -0700, David Jensen wrote:

I was just wondering who out there is using Elastic Search in
production.

We are - we're the biggest provider of family announcements hosting to
newspapers in Europe:

These are some of our UK sites

Clint

Thanks,
David

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.
--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


(nfo) #5

Nice, do you use "niofs" or "memory" ? How many nodes do you need for
that amount of docs ?

On Jun 8, 12:12 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

On Mon, 2010-06-07 at 18:11 -0700, David Jensen wrote:

Roughly how much data do you manage in your search indexes?

We've got about 6 million docs indexed, average size about 0.5kB -
anywhere from 8 to 20 fields per doc

clint

If you're allowed to share that information, of course.

On Jun 7, 3:03 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

On Mon, 2010-06-07 at 14:48 -0700, David Jensen wrote:

I was just wondering who out there is using Elastic Search in
production.

We are - we're the biggest provider of family announcements hosting to
newspapers in Europe:

These are some of our UK sites

Clint

Thanks,
David

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


(Clinton Gormley) #6

On Wed, 2010-06-09 at 01:07 -0700, nfo wrote:

Nice, do you use "niofs" or "memory" ? How many nodes do you need for
that amount of docs ?

We're using niofs - the gateway and niofs work directories are both
about 10GB in size.

I'm running just two nodes at the moment, more for availability than
performance. A single node handles our load easily.

Our main node has 4 x quad cores and 12 GB of mem. On this node, I'm
giving the JVM 6GB of memory, and the rest is used by linux for caching.

The backup node has one dual core and 10GB of memory. I've assigned 3GB
of memory to the JVM, as I'm also running a slave DB server on that
node.

The gateway is shared over NFSv4 (the main node is also the NFS server)
and the NFS shares are mounted with the default settings.

The only other changes I made were to set the ulimit on the number of
open files to 20,000 for the elasticsearch user, and to add these lines
to /etc/sysctl.conf:

net.core.wmem_max = 655360
net.core.rmem_max = 26214400

These setting were recommended by jgroups - although we're now using
Zen, I thought they may still be relevant, so I left them in.

Note: it is much better for your elasticsearch nodes to be on dedicated
boxes, rather than shared with other services (like my backup DB). When
the DB suddenly requires a lot of resources, it can slow ES right down,
and cause timeouts.

We're using the Perl client (ElasticSearch.pm) and with the above setup
and data, we can index 400 - 500 docs per second.

This is considerably slower than the Java client, but I'm working on
improving this.

A lot of the fault is in the standard HTTP module used in Perl, which is
very accurate, but doesn't perform terribly well. Switching to a
lighter HTTP module increased performance by about 30%, and the
memcached interface almost doubled performance (with the downside that
you don't get nearly as much visibility of what is happening in the
server).

Of course, this latency matters less with searches than with indexing,
as it forms a smaller percentage of total time taken.

hth

clint


(Erwan Arzur-2) #7

2010/6/9 Clinton Gormley clinton@iannounce.co.uk:

On Wed, 2010-06-09 at 01:07 -0700, nfo wrote:

Nice, do you use "niofs" or "memory" ? How many nodes do you need for
that amount of docs ?

We're using niofs - the gateway and niofs work directories are both
about 10GB in size.

Very helpful ! thanks a lot for sharing your findings. We are in the
process of evaluating elasticsearch for a database (~500GB) that will
contain a total of about ten times the number of documents (60M), and
multiple indexes containing between 10k and 500k documents.

Quite a big workload. Currently my main concern is to find what share
of the index needs to reside in memory, in order to find out how many
nodes we'll need to deploy.

Thanks again,

Erwan


(system) #8