Splunk vs. Elastic search performance?


(Frank Flynn) #1

We have a large Splunk instance. We load about 1.25 Tb of logs a day. We
have about 1,300 loaders (servers that collect and load logs - they may do
other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on very similar
hardware? More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the
frequency we'd run them but I'm trying to get any kind of idea before we
start.

Are there any white papers or other documents about switching? It seems an
obvious choice but I can only find very little performance comparisons (I
did see that Elasticsearch just hired "the former VP of Products at Splunk,
Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

That's a lot of data! I don't know of any installations that big but
someone else might.

What sort of infrastructure are you running splunk on now, what's your
current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 April 2014 07:33, Frank Flynn faultlessfrank@gmail.com wrote:

We have a large Splunk instance. We load about 1.25 Tb of logs a day. We
have about 1,300 loaders (servers that collect and load logs - they may do
other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on very similar
hardware? More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the
frequency we'd run them but I'm trying to get any kind of idea before we
start.

Are there any white papers or other documents about switching? It seems
an obvious choice but I can only find very little performance comparisons
(I did see that Elasticsearch just hired "the former VP of Products at
Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZwX2YACKX_yobDK%2BjXHRdexq2gKQ1iOO7%3DAPPoKkBZmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Greg Murnane) #3

I'm running elasticsearch much smaller than this, but with a PowerEdge R900
with 2 X7350 CPUs, and 64 GB of RAM (24GB heap for elasticsearch) I'm able
to sustain something like 80GB per day (1/16 your volume). Some of the
latest Intel CPUs are about 4 times as powerful as the X7350, so
extrapolating from my results, with very new hardware you can probably do
1.25TB per day on around 5 nodes with 2 CPUs, 256GB RAM, and 8 disks each.
I haven't had an opportunity to test this yet, and even if this is
possible, you should probably get have more nodes than this; hardware
failure, growth, or a sudden increase in logging volume from a problem can
take down a cluster that's running at full capacity all the time.

I'd encourage you to put elasticsearch on some of your systems to generate
some benchmarks. I've never tried clustering elasticsearch with more than 5
hosts. At 1300 systems, each would be doing around 15 KB/s, which is
essentially trivial. You might try taking splunk off 2 dozen systems or so,
and committing them to elasticsearch, then see how well they keep up with
the load you're generating. Data from your particular setup will almost
always be the best sort to have.

--
The information transmitted in this email is intended only for the
person(s) or entity to which it is addressed and may contain confidential
and/or privileged material. Any review, retransmission, dissemination or
other use of, or taking of any action in reliance upon, this information by
persons or entities other than the intended recipient is prohibited. If you
received this email in error, please contact the sender and permanently
delete the email from any computer.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d465a805-0ada-4398-b4d8-f8ab56e4f34b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jaguar) #4

We have a cluster with 10 nodes, 48g heap for each ES process. The total
indexing rate is about 25000 doc per second, about 20 indices actively
receiving new data. I'm really courious to compare and evaluate the
indexing performance numers.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ34ZwY6Or0PUFZn_Ciu_iyZZJjyXfz%3DNBu64Ge9uN3hxQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Clinton Gormley) #5

Goldman Sachs gave a talk about how they're using Elasticsearch to index
5TB of log data per day. I can't find the video of the talk, but from a
blogpost about it:

Next was Indy Tharmakumar from our hosts Goldman
Sachshttp://www.goldmansachs.com/,
showing how his team have built powerful support systems using
ElasticSearch to index log data. Using 32 1 core CPU instances the system
they have built can store 1.2 billion log lines with a throughput up to
40,000 messages a second (the systems monitored produce 5TB of log data
every day). Log data is queued up in Redis http://redis.io/, distributed
to many Logstash http://logstash.net/ processes, indexed by Elasticsearch
with a Kibana http://rashidkpc.github.io/Kibana/ front end. They learned
that Logstash can be particularly CPU intensive but Elasticsearch itself
scales extremely well. Future plans include considering Apache
Kafkahttp://kafka.apache.org/ as
a data backbone.

On 19 April 2014 06:46, 熊贻青 xiong.jaguar@gmail.com wrote:

We have a cluster with 10 nodes, 48g heap for each ES process. The total
indexing rate is about 25000 doc per second, about 20 indices actively
receiving new data. I'm really courious to compare and evaluate the
indexing performance numers.

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ34ZwY6Or0PUFZn_Ciu_iyZZJjyXfz%3DNBu64Ge9uN3hxQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAP0hgQ34ZwY6Or0PUFZn_Ciu_iyZZJjyXfz%3DNBu64Ge9uN3hxQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKS6AnOVKwNszU-SFYmyGUpk57U_kz8iXQssGk%3DX81KMiA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Frank Flynn) #6

thanks for the tips so far. I should have been a bit more specific. It's
Saturday toady and I'm doing this off the top of my head so I might be off
by a bit but as I recall in Splunk right now we have the equivalent to 11
indexes - the biggest one runs 4Gb a day, all together they run 1.2Tb a
day. We retain the data for 90 days. We have 12 machines indexing the
data in EC2 (m2.4xlarge) and although it works fine it is too slow (users
complain about report speed).

If EC works the money I can save from not renewing my spunk license could
easily double the number of servers and upgrade them to the i class (SSD
storage with big ram) and send the team to Europe for a couple of weeks
(although the trip to Europe is not my decision).

I will look for the Goldman Sachs talk. My plan after reading the ES
website is to leave Splunk alone, fork the data for one index to a new ES
cluster and Splunk then make the comparisons. My only issue is if I go
with the i instances (with SSD's) it's not a fair comparison for
benchmarking. That may not be a big deal for me but I'd love to see the
Apples to Apples numbers.

Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6dafa0bb-3616-476e-9409-0fed8b47dd86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Sabareesh SS) #7

What are the different ways I can make a good use of Elasticsearch?

On Saturday, April 19, 2014 3:03:59 AM UTC+5:30, Frank Flynn wrote:

We have a large Splunk instance. We load about 1.25 Tb of logs a day. We
have about 1,300 loaders (servers that collect and load logs - they may do
other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on very similar
hardware? More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the
frequency we'd run them but I'm trying to get any kind of idea before we
start.

Are there any white papers or other documents about switching? It seems
an obvious choice but I can only find very little performance comparisons
(I did see that Elasticsearch just hired "the former VP of Products at
Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b59aaf2-c64f-4299-a066-7533aafac97f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Thomas Paulsen) #8

We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.

We tried Elasticsearch and we were able to get the same performance with
the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.

I don´t recommend ELK for a critical production system, for just dev work,
it is ok, if you don´t mind the hassle of setting up and operating it. The
costs you save by not buying a splunk license you have to invest into
consultants to get it up and running. Our dev teams hate Elasticsearch and
prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:

That's a lot of data! I don't know of any installations that big but
someone else might.

What sort of infrastructure are you running splunk on now, what's your
current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 19 April 2014 07:33, Frank Flynn <faultle...@gmail.com <javascript:>>
wrote:

We have a large Splunk instance. We load about 1.25 Tb of logs a day.
We have about 1,300 loaders (servers that collect and load logs - they may
do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on very similar
hardware? More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the
frequency we'd run them but I'm trying to get any kind of idea before we
start.

Are there any white papers or other documents about switching? It seems
an obvious choice but I can only find very little performance comparisons
(I did see that Elasticsearch just hired "the former VP of Products at
Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #9

I'd be interested in knowing what problems you had with ELK, if you don't
mind sharing.

I understand the ease of splunk, but ELK isn't that difficult if you have
some in-house linux skills.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 June 2014 22:48, Thomas Paulsen monokit2010@googlemail.com wrote:

We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.

We tried Elasticsearch and we were able to get the same performance with
the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.

I don´t recommend ELK for a critical production system, for just dev work,
it is ok, if you don´t mind the hassle of setting up and operating it. The
costs you save by not buying a splunk license you have to invest into
consultants to get it up and running. Our dev teams hate Elasticsearch and
prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:

That's a lot of data! I don't know of any installations that big but
someone else might.

What sort of infrastructure are you running splunk on now, what's your
current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 April 2014 07:33, Frank Flynn faultle...@gmail.com wrote:

We have a large Splunk instance. We load about 1.25 Tb of logs a day.
We have about 1,300 loaders (servers that collect and load logs - they may
do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on very similar
hardware? More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the
frequency we'd run them but I'm trying to get any kind of idea before we
start.

Are there any white papers or other documents about switching? It seems
an obvious choice but I can only find very little performance comparisons
(I did see that Elasticsearch just hired "the former VP of Products at
Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Y1eem-s6hD3QLfnKHJdZS2p5jtwO%2ByyMbqbcYDrroH1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #10

It is correct you noted that Elasticsearch comes with developer settings -
that is exactly what a packages ES is meant for.

If you find issues when configuring and setting up ES for critical use, it
would be nice to post your issues so others can also find help too, and
maybe share their solutions , because there are ES installations that run
successfully in critical environments.

By just quoting "hate" of dev teams, it is rather impossible for me to
learn about the reason why this is so. Learning facts is more important
than emotions to fix software issues. The power of open source is that such
issues can be fixed by the help of a public discussion in the community. In
closed software products, you can not rely on issues being discussed
publicly for best solutions how to fix them.

Jörg

On Thu, Jun 19, 2014 at 2:48 PM, Thomas Paulsen monokit2010@googlemail.com
wrote:

We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.

We tried Elasticsearch and we were able to get the same performance with
the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.

I don´t recommend ELK for a critical production system, for just dev work,
it is ok, if you don´t mind the hassle of setting up and operating it. The
costs you save by not buying a splunk license you have to invest into
consultants to get it up and running. Our dev teams hate Elasticsearch and
prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:

That's a lot of data! I don't know of any installations that big but
someone else might.

What sort of infrastructure are you running splunk on now, what's your
current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 April 2014 07:33, Frank Flynn faultle...@gmail.com wrote:

We have a large Splunk instance. We load about 1.25 Tb of logs a day.
We have about 1,300 loaders (servers that collect and load logs - they may
do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on very similar
hardware? More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the
frequency we'd run them but I'm trying to get any kind of idea before we
start.

Are there any white papers or other documents about switching? It seems
an obvious choice but I can only find very little performance comparisons
(I did see that Elasticsearch just hired "the former VP of Products at
Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGtte%3DRWjZCNtBWcX5y4Z9j7yXpyXC5MWdzpqubtCce5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #11

Thomas,

Thanks for your insights and experiences. As I am someone who has explored
and used ES for over a year but is relatively new to the ELK stack, your
data points are extremely valuable. Let me offer some of my own views.

Re: double the storage. I strongly recommend ELK users to disable the _all
field. The entire text of the log events generated by logstash ends up in
the message field (and not @message as many people incorrectly post). So
the _all field is just redundant overhead with no value add. The result is
a dramatic drop in database file sizes and dramatic increase in load
performance. Of course, you need to configure ES to use the message field
as the default for a Lucene Kibana query.

During the year that I've used ES and watched this group, I have been on
the front line of a brand new product with a smart and dedicated
development team working steadily to improve the product. Six months ago,
the ELK stack eluded me and reports weren't encouraging (with the sole
exception of the Kibana web site's marketing pitch). But ES has come a long
way since six months ago, and the ELK stack is much more closely integrated.

The Splunk UI is carefully crafted to isolate users from each other and
prevent external (to the Splunk db itself, not to our company) users from
causing harm to data. But Kibana seems to be meant for a small cadre of
trusted users. What if I write a dashboard with the same name as someone
else's? Kibana doesn't even begin to discuss user isolation. But I am
confident that it will.

How can I tell Kibana to set the default Lucene query operator to AND
instead of OR. Google is not my friend: I keep getting references to the
Ruby versions of Kibana; that's ancient history by now. Kibana is cool and
promising, but it has a long way to go for deployment to all of the folks
in our company who currently have access to Splunk.

Logstash has a nice book that's been very helpful, and logstash itself has
been an excellent tool for prototyping. The book has been invaluable in
helping me extract dates from log events and handling all of our different
multiline events. But it still doesn't explain why the date filter needs a
different array of matching strings to get the date that the grok filter
has already matched and isolated. And recommendations to avoid the
elasticsearch_http output and use elasticsearch (via the Node client)
directly contradict the fact that logstash's 1.1.1 version of the ES client
library is not compatible with the most recent 1.2.1 version of ES.

And logstash is also a resource hog, so we eventually plan to replace it
with Perl and Apache Flume (already in use) and pipe it into my Java bulk
load tool (which is always kept up-to-date with the versions of ES we
deploy!!). Because we send the data via Flume to our data warehouse, any
losses in ES will be annoying but won't be catastrophic. And the front-end
following of rotated log files will be done using the GNU tail -F command
and option. This GNU tail command with its uppercase -F option follows
rotated log files perfectly. I doubt that logstash can do the same, and we
currently see that neither can Splunk (so we sporadically lose log events
in Splunk too). So GNU tail -F piped into logstash with the stdin filter
works perfectly in my evaluation setup and will likely form the first stage
of any log forwarder we end up deploying,

Brian

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.

We tried Elasticsearch and we were able to get the same performance with
the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.

I don´t recommend ELK for a critical production system, for just dev work,
it is ok, if you don´t mind the hassle of setting up and operating it. The
costs you save by not buying a splunk license you have to invest into
consultants to get it up and running. Our dev teams hate Elasticsearch and
prefer Splunk.

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.

We tried Elasticsearch and we were able to get the same performance with
the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.

I don´t recommend ELK for a critical production system, for just dev work,
it is ok, if you don´t mind the hassle of setting up and operating it. The
costs you save by not buying a splunk license you have to invest into
consultants to get it up and running. Our dev teams hate Elasticsearch and
prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:

That's a lot of data! I don't know of any installations that big but
someone else might.

What sort of infrastructure are you running splunk on now, what's your
current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 April 2014 07:33, Frank Flynn faultle...@gmail.com wrote:

We have a large Splunk instance. We load about 1.25 Tb of logs a day.
We have about 1,300 loaders (servers that collect and load logs - they may
do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on very similar
hardware? More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the
frequency we'd run them but I'm trying to get any kind of idea before we
start.

Are there any white papers or other documents about switching? It seems
an obvious choice but I can only find very little performance comparisons
(I did see that Elasticsearch just hired "the former VP of Products at
Splunk, Gaurav Gupta" - but there were few numbers in that article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6441b278-39ad-417d-98a6-d6e131895634%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Patrick Proniewski) #12

On 20 juin 2014, at 18:43, Brian wrote:

Re: double the storage. I strongly recommend ELK users to disable the _all field. The entire text of the log events generated by logstash ends up in the message field (and not @message as many people incorrectly post). So the _all field is just redundant overhead with no value add. The result is a dramatic drop in database file sizes and dramatic increase in load performance. Of course, you need to configure ES to use the message field as the default for a Lucene Kibana query.

"message" field can be edited during logstash filtering, but admitting it's enough, I would love to remove "_all" field and point Kibana to "message". Oddly, I can't find the "_all" field, neither in Sense, nor in Kibana. I know it's enabled:

GET _template/logstash

{
"logstash": {
"order": 0,
"template": "logstash-",
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"default": {
"dynamic_templates": [
{
"string_fields": {
"mapping": {
"index": "analyzed",
"omit_norms": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"ignore_above": 256,
"type": "string"
}
}
},
"match_mapping_type": "string",
"match": "
"
}
}
],
"properties": {
"geoip": {
"dynamic": true,
"path": "full",
"properties": {
"location": {
"type": "geo_point"
}
},
"type": "object"
},
"@version": {
"index": "not_analyzed",
"type": "string"
}
},
"_all": {
"enabled": true <------
}
}
},
"aliases": {}
}
}

But it looks like I cant retrieve/display its content. Any idea?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/DA2C93C0-709E-4DAA-96A3-F6AB4588FF6A%40patpro.net.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #13

Patrick,

Here's my template, along with where the _all field is disabled. You may
wish to add this setting to your own template, and then also add the index
setting to ignore malformed data (if someone's log entry occasionally slips
in "null" or "no-data" instead of the usual numeric value):

{
"automap" : {
"template" : "logstash-*",
"settings" : {
"index.mapping.ignore_malformed" : true
},
"mappings" : {
"default" : {
"numeric_detection" : true,
"_all" : { "enabled" : false },
"properties" : {
"message" : { "type" : "string" },
"host" : { "type" : "string" },
"UUID" : { "type" : "string", "index" : "not_analyzed" },
"logdate" : { "type" : "string", "index" : "no" }
}
}
}
}
}

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a145cb1e-4013-4a6b-a58d-9a42368d8107%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #14

I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated that
there was minimal performance difference, at the greater benefit of not
being locked to specific LS+ES versioning.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 21 June 2014 02:43, Brian brian.from.fl@gmail.com wrote:

Thomas,

Thanks for your insights and experiences. As I am someone who has explored
and used ES for over a year but is relatively new to the ELK stack, your
data points are extremely valuable. Let me offer some of my own views.

Re: double the storage. I strongly recommend ELK users to disable the _all
field. The entire text of the log events generated by logstash ends up in
the message field (and not @message as many people incorrectly post). So
the _all field is just redundant overhead with no value add. The result is
a dramatic drop in database file sizes and dramatic increase in load
performance. Of course, you need to configure ES to use the message field
as the default for a Lucene Kibana query.

During the year that I've used ES and watched this group, I have been on
the front line of a brand new product with a smart and dedicated
development team working steadily to improve the product. Six months ago,
the ELK stack eluded me and reports weren't encouraging (with the sole
exception of the Kibana web site's marketing pitch). But ES has come a long
way since six months ago, and the ELK stack is much more closely integrated.

The Splunk UI is carefully crafted to isolate users from each other and
prevent external (to the Splunk db itself, not to our company) users from
causing harm to data. But Kibana seems to be meant for a small cadre of
trusted users. What if I write a dashboard with the same name as someone
else's? Kibana doesn't even begin to discuss user isolation. But I am
confident that it will.

How can I tell Kibana to set the default Lucene query operator to AND
instead of OR. Google is not my friend: I keep getting references to the
Ruby versions of Kibana; that's ancient history by now. Kibana is cool and
promising, but it has a long way to go for deployment to all of the folks
in our company who currently have access to Splunk.

Logstash has a nice book that's been very helpful, and logstash itself has
been an excellent tool for prototyping. The book has been invaluable in
helping me extract dates from log events and handling all of our different
multiline events. But it still doesn't explain why the date filter needs a
different array of matching strings to get the date that the grok filter
has already matched and isolated. And recommendations to avoid the
elasticsearch_http output and use elasticsearch (via the Node client)
directly contradict the fact that logstash's 1.1.1 version of the ES client
library is not compatible with the most recent 1.2.1 version of ES.

And logstash is also a resource hog, so we eventually plan to replace it
with Perl and Apache Flume (already in use) and pipe it into my Java bulk
load tool (which is always kept up-to-date with the versions of ES we
deploy!!). Because we send the data via Flume to our data warehouse, any
losses in ES will be annoying but won't be catastrophic. And the front-end
following of rotated log files will be done using the GNU tail -F command
and option. This GNU tail command with its uppercase -F option follows
rotated log files perfectly. I doubt that logstash can do the same, and we
currently see that neither can Splunk (so we sporadically lose log events
in Splunk too). So GNU tail -F piped into logstash with the stdin filter
works perfectly in my evaluation setup and will likely form the first stage
of any log forwarder we end up deploying,

Brian

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.

We tried Elasticsearch and we were able to get the same performance with
the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.

I don´t recommend ELK for a critical production system, for just dev
work, it is ok, if you don´t mind the hassle of setting up and operating
it. The costs you save by not buying a splunk license you have to invest
into consultants to get it up and running. Our dev teams hate Elasticsearch
and prefer Splunk.

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.

We tried Elasticsearch and we were able to get the same performance with
the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.

I don´t recommend ELK for a critical production system, for just dev
work, it is ok, if you don´t mind the hassle of setting up and operating
it. The costs you save by not buying a splunk license you have to invest
into consultants to get it up and running. Our dev teams hate Elasticsearch
and prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:

That's a lot of data! I don't know of any installations that big but
someone else might.

What sort of infrastructure are you running splunk on now, what's your
current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 April 2014 07:33, Frank Flynn faultle...@gmail.com wrote:

We have a large Splunk instance. We load about 1.25 Tb of logs a day.
We have about 1,300 loaders (servers that collect and load logs - they may
do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on very similar
hardware? More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the
frequency we'd run them but I'm trying to get any kind of idea before we
start.

Are there any white papers or other documents about switching? It
seems an obvious choice but I can only find very little performance
comparisons (I did see that Elasticsearch just hired "the former VP of
Products at Splunk, Gaurav Gupta" - but there were few numbers in that
article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6441b278-39ad-417d-98a6-d6e131895634%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6441b278-39ad-417d-98a6-d6e131895634%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZPUksz0DdYMPrTrN0D21PqSdbZrEozGsG8srjom3CvSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #15

Mark,

I've read one post (can't remember where) that the Node client was
preferred, but have also read where the HTTP interface is minimal overhead.
So yes, I am currently using logstash with the HTTP interface and it works
fine.

I also performed some experiments with clustering (not much, due to
resource and time constraints) and used unicast discovery. Then I read
someone who strongly recommended multicast recovery, and I started to feel
like I'd gone down the wrong path. Then I watched the ELK webinar and heard
that unicast discovery was preferred. I think it's not a big deal either
way; it's what works best for your particular networking infrastructure.

In addition, I was recently given this link:
http://aphyr.com/posts/317-call-me-maybe-elasticsearch. It hasn't dissuaded
me at all, but it is a thought-provoking read. I am a little confused by
some things, though. In all of my high-performance banging on ES, even with
my time-to-live test feature enabled, I never lost any documents at all.
But I wasn't using auto-id; I was specifying my own unique ID. And when run
in my 3-node cluster (slow due to being hosted by 3 VMs running on a
dual-code machine), I still didn't lose any data. So I am not sure of the
high data loss scenarios he describes in his missive; I have seen no
evidence of any data loss due to false insert positives at all.

Brian

On Friday, June 20, 2014 6:30:27 PM UTC-4, Mark Walkom wrote:

I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated
that there was minimal performance difference, at the greater benefit of
not being locked to specific LS+ES versioning.

Regards,
Mark Walkom

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f7621a17-9366-4166-9612-61415938013f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #16

The data loss scenarios in Aphyr's post are easily generated because his
tools stress test the database systems he's testing to the limit, he's
practically provoking the DBs he tests to fail (tho they shouldn't really).

In normal operations you normally should not see failures, but what Aphyr
showed is that when failure conditions happen the chances you will are
pretty high. Thanks to the Fallacies of Distributed Computing, that
basically means those are bound to happen every now and then. If and how
much data you lose will vary based on volumes, setups etc.

HTH

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sat, Jun 21, 2014 at 2:56 AM, Brian brian.from.fl@gmail.com wrote:

Mark,

I've read one post (can't remember where) that the Node client was
preferred, but have also read where the HTTP interface is minimal overhead.
So yes, I am currently using logstash with the HTTP interface and it works
fine.

I also performed some experiments with clustering (not much, due to
resource and time constraints) and used unicast discovery. Then I read
someone who strongly recommended multicast recovery, and I started to feel
like I'd gone down the wrong path. Then I watched the ELK webinar and heard
that unicast discovery was preferred. I think it's not a big deal either
way; it's what works best for your particular networking infrastructure.

In addition, I was recently given this link:
http://aphyr.com/posts/317-call-me-maybe-elasticsearch. It hasn't
dissuaded me at all, but it is a thought-provoking read. I am a little
confused by some things, though. In all of my high-performance banging on
ES, even with my time-to-live test feature enabled, I never lost any
documents at all. But I wasn't using auto-id; I was specifying my own
unique ID. And when run in my 3-node cluster (slow due to being hosted by 3
VMs running on a dual-code machine), I still didn't lose any data. So I am
not sure of the high data loss scenarios he describes in his missive; I
have seen no evidence of any data loss due to false insert positives at all.

Brian

On Friday, June 20, 2014 6:30:27 PM UTC-4, Mark Walkom wrote:

I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated
that there was minimal performance difference, at the greater benefit of
not being locked to specific LS+ES versioning.

Regards,
Mark Walkom

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f7621a17-9366-4166-9612-61415938013f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f7621a17-9366-4166-9612-61415938013f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt%3Dhfog0zL2dp5y0Bs9R4foZ4wfzEOkOL%2B-WtAENMaBew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #17

I agree. I thought elasticsearch_http was actually the recommended route.
Also, I have seen no reported issues with different client/server versions
since 1.0. My current logstash setup (which is not production level, simply
a dev logging tool) uses Elasticsearch 1.2.1 with Logstash 1.4.1 using the
non http interface.

--
Ivan

On Fri, Jun 20, 2014 at 3:29 PM, Mark Walkom markw@campaignmonitor.com
wrote:

I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated
that there was minimal performance difference, at the greater benefit of
not being locked to specific LS+ES versioning.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 21 June 2014 02:43, Brian brian.from.fl@gmail.com wrote:

Thomas,

Thanks for your insights and experiences. As I am someone who has
explored and used ES for over a year but is relatively new to the ELK
stack, your data points are extremely valuable. Let me offer some of my own
views.

Re: double the storage. I strongly recommend ELK users to disable the
_all field. The entire text of the log events generated by logstash ends up
in the message field (and not @message as many people incorrectly post).
So the _all field is just redundant overhead with no value add. The result
is a dramatic drop in database file sizes and dramatic increase in load
performance. Of course, you need to configure ES to use the message field
as the default for a Lucene Kibana query.

During the year that I've used ES and watched this group, I have been on
the front line of a brand new product with a smart and dedicated
development team working steadily to improve the product. Six months ago,
the ELK stack eluded me and reports weren't encouraging (with the sole
exception of the Kibana web site's marketing pitch). But ES has come a long
way since six months ago, and the ELK stack is much more closely integrated.

The Splunk UI is carefully crafted to isolate users from each other and
prevent external (to the Splunk db itself, not to our company) users from
causing harm to data. But Kibana seems to be meant for a small cadre of
trusted users. What if I write a dashboard with the same name as someone
else's? Kibana doesn't even begin to discuss user isolation. But I am
confident that it will.

How can I tell Kibana to set the default Lucene query operator to AND
instead of OR. Google is not my friend: I keep getting references to the
Ruby versions of Kibana; that's ancient history by now. Kibana is cool and
promising, but it has a long way to go for deployment to all of the folks
in our company who currently have access to Splunk.

Logstash has a nice book that's been very helpful, and logstash itself
has been an excellent tool for prototyping. The book has been invaluable in
helping me extract dates from log events and handling all of our different
multiline events. But it still doesn't explain why the date filter needs a
different array of matching strings to get the date that the grok filter
has already matched and isolated. And recommendations to avoid the
elasticsearch_http output and use elasticsearch (via the Node client)
directly contradict the fact that logstash's 1.1.1 version of the ES client
library is not compatible with the most recent 1.2.1 version of ES.

And logstash is also a resource hog, so we eventually plan to replace it
with Perl and Apache Flume (already in use) and pipe it into my Java bulk
load tool (which is always kept up-to-date with the versions of ES we
deploy!!). Because we send the data via Flume to our data warehouse, any
losses in ES will be annoying but won't be catastrophic. And the front-end
following of rotated log files will be done using the GNU tail -F command
and option. This GNU tail command with its uppercase -F option follows
rotated log files perfectly. I doubt that logstash can do the same, and we
currently see that neither can Splunk (so we sporadically lose log events
in Splunk too). So GNU tail -F piped into logstash with the stdin filter
works perfectly in my evaluation setup and will likely form the first stage
of any log forwarder we end up deploying,

Brian

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.

We tried Elasticsearch and we were able to get the same performance with
the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.

I don´t recommend ELK for a critical production system, for just dev
work, it is ok, if you don´t mind the hassle of setting up and operating
it. The costs you save by not buying a splunk license you have to invest
into consultants to get it up and running. Our dev teams hate Elasticsearch
and prefer Splunk.

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.

We tried Elasticsearch and we were able to get the same performance with
the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.

I don´t recommend ELK for a critical production system, for just dev
work, it is ok, if you don´t mind the hassle of setting up and operating
it. The costs you save by not buying a splunk license you have to invest
into consultants to get it up and running. Our dev teams hate Elasticsearch
and prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:

That's a lot of data! I don't know of any installations that big but
someone else might.

What sort of infrastructure are you running splunk on now, what's your
current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 April 2014 07:33, Frank Flynn faultle...@gmail.com wrote:

We have a large Splunk instance. We load about 1.25 Tb of logs a day.
We have about 1,300 loaders (servers that collect and load logs - they may
do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on very similar
hardware? More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the
frequency we'd run them but I'm trying to get any kind of idea before we
start.

Are there any white papers or other documents about switching? It
seems an obvious choice but I can only find very little performance
comparisons (I did see that Elasticsearch just hired "the former VP of
Products at Splunk, Gaurav Gupta" - but there were few numbers in that
article either).

Thanks,
Frank

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6441b278-39ad-417d-98a6-d6e131895634%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6441b278-39ad-417d-98a6-d6e131895634%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZPUksz0DdYMPrTrN0D21PqSdbZrEozGsG8srjom3CvSQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZPUksz0DdYMPrTrN0D21PqSdbZrEozGsG8srjom3CvSQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCR1iuW-CF0XWZ1cexuYP4Ttfp%3DCaCyxngNA_zWAK6OHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Patrick Proniewski) #18

Brian,

Thank you for the reply, even if it does not answer my question.

By the way, how am I supposed to change a mapping setting? Do I have to push back the entire mapping with one line modified, or can I just push something like:

{
"logstash": {
"mappings": {
"default": {
"_all": {
"enabled": false
}
}
}
}
}

On 20 juin 2014, at 23:04, Brian wrote:

Patrick,

Here's my template, along with where the _all field is disabled. You may wish to add this setting to your own template, and then also add the index setting to ignore malformed data (if someone's log entry occasionally slips in "null" or "no-data" instead of the usual numeric value):

{
"automap" : {
"template" : "logstash-*",
"settings" : {
"index.mapping.ignore_malformed" : true
},
"mappings" : {
"default" : {
"numeric_detection" : true,
"_all" : { "enabled" : false },
"properties" : {
"message" : { "type" : "string" },
"host" : { "type" : "string" },
"UUID" : { "type" : "string", "index" : "not_analyzed" },
"logdate" : { "type" : "string", "index" : "no" }
}
}
}
}
}

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8D497ED9-54DF-48EA-AA91-44A621B72287%40patpro.net.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #19

Patrick,

Well, I did answer your question. But probably not from the direction you
expected.

When I create and manage specific indices, I lock down Elasticsearch. When
I update the mappings, I understand that ES will not allow the mapping for
an existing field to be modified in an incompatible way. So I only update
to add new fields, and never to change or remove an existing field.

For time-based indices as used by the ELK stack, it makes the most sense to
me to create an on-disk mapping template. So I always disable the all field
and pre-map a subset of string fields as shown in my previous post. I do
this because when the next day arrives and logstash causes a new index to
be created, that new index will also set my default mapping from the
template.

I don't disable the _all field in an existing index that currently has it
enabled. I don't know if it would succeed or fail, but I would not expect
it to be successful.

Instead, based on my previous experience with ES, I disable the _all field
and have disabled it from the very first test deployment of the ELK stack
in our group. And then I configured my ES startup script to set message as
the default field for a Lucene query. This was already set up and working
when I let others have access to it for the very first time. So I don't
know the answer to your specific question.

But I do know that a lot of experimentation went into my ELK configurations
before I let anyone else look at it for the very first time. So don't be
afraid to change your mappings and leave the old ones behind, and re-add
data as needed to get everything just the way you want it.

Brian

On Monday, June 30, 2014 1:22:34 AM UTC-4, Patrick Proniewski wrote:

Brian,

Thank you for the reply, even if it does not answer my question.

By the way, how am I supposed to change a mapping setting? Do I have to
push back the entire mapping with one line modified, or can I just push
something like:

{
"logstash": {
"mappings": {
"default": {
"_all": {
"enabled": false
}
}
}
}
}

On 20 juin 2014, at 23:04, Brian wrote:

Patrick,

Here's my template, along with where the _all field is disabled. You may
wish to add this setting to your own template, and then also add the index
setting to ignore malformed data (if someone's log entry occasionally slips
in "null" or "no-data" instead of the usual numeric value):

{
"automap" : {
"template" : "logstash-*",
"settings" : {
"index.mapping.ignore_malformed" : true
},
"mappings" : {
"default" : {
"numeric_detection" : true,
"_all" : { "enabled" : false },
"properties" : {
"message" : { "type" : "string" },
"host" : { "type" : "string" },
"UUID" : { "type" : "string", "index" : "not_analyzed" },
"logdate" : { "type" : "string", "index" : "no" }
}
}
}
}
}

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ff289e5-baf7-4d25-8412-8fcf967440fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Patrick Proniewski) #20

Brian,

On 30 juin 2014, at 22:59, Brian wrote:

Well, I did answer your question. But probably not from the direction you expected.

hmm no, you didn't. My question was: "it looks like I cant retrieve/display [_all fields] content. Any idea?" and you replied with your logstash template where _all is disabled.
I'm interested in disabling _all, but that was not my question at this point.

Your answer to my second message, below, is informative and interesting but fails to answer my second question too. I simply asked whether I need to feed the complete modified mapping of my template or if I can just push the modified part (ie. the _all:{enabled: false} part).

When I create and manage specific indices, I lock down Elasticsearch. When I update the mappings, I understand that ES will not allow the mapping for an existing field to be modified in an incompatible way. So I only update to add new fields, and never to change or remove an existing field.

For time-based indices as used by the ELK stack, it makes the most sense to me to create an on-disk mapping template. So I always disable the all field and pre-map a subset of string fields as shown in my previous post. I do this because when the next day arrives and logstash causes a new index to be created, that new index will also set my default mapping from the template.

I don't disable the _all field in an existing index that currently has it enabled. I don't know if it would succeed or fail, but I would not expect it to be successful.

Instead, based on my previous experience with ES, I disable the _all field and have disabled it from the very first test deployment of the ELK stack in our group. And then I configured my ES startup script to set message as the default field for a Lucene query. This was already set up and working when I let others have access to it for the very first time. So I don't know the answer to your specific question.

But I do know that a lot of experimentation went into my ELK configurations before I let anyone else look at it for the very first time. So don't be afraid to change your mappings and leave the old ones behind, and re-add data as needed to get everything just the way you want it.

Brian

On Monday, June 30, 2014 1:22:34 AM UTC-4, Patrick Proniewski wrote:
Brian,

Thank you for the reply, even if it does not answer my question.

By the way, how am I supposed to change a mapping setting? Do I have to push back the entire mapping with one line modified, or can I just push something like:

{
"logstash": {
"mappings": {
"default": {
"_all": {
"enabled": false
}
}
}
}
}

On 20 juin 2014, at 23:04, Brian wrote:

Patrick,

Here's my template, along with where the _all field is disabled. You may wish to add this setting to your own template, and then also add the index setting to ignore malformed data (if someone's log entry occasionally slips in "null" or "no-data" instead of the usual numeric value):

{
"automap" : {
"template" : "logstash-*",
"settings" : {
"index.mapping.ignore_malformed" : true
},
"mappings" : {
"default" : {
"numeric_detection" : true,
"_all" : { "enabled" : false },
"properties" : {
"message" : { "type" : "string" },
"host" : { "type" : "string" },
"UUID" : { "type" : "string", "index" : "not_analyzed" },
"logdate" : { "type" : "string", "index" : "no" }
}
}
}
}
}

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/B44B497A-5DC3-4BC5-9164-7F53B5D1D6B6%40patpro.net.
For more options, visit https://groups.google.com/d/optout.