Thomas,
Thanks for your insights and experiences. As I am someone who has
explored and used ES for over a year but is relatively new to the ELK
stack, your data points are extremely valuable. Let me offer some of my own
views.
Re: double the storage. I strongly recommend ELK users to disable the
_all field. The entire text of the log events generated by logstash ends up
in the message field (and not @message as many people incorrectly
post). So the _all field is just redundant overhead with no value add. The
result is a dramatic drop in database file sizes and dramatic increase in
load performance. Of course, you need to configure ES to use the message field
as the default for a Lucene Kibana query.
During the year that I've used ES and watched this group, I have been on
the front line of a brand new product with a smart and dedicated
development team working steadily to improve the product. Six months ago,
the ELK stack eluded me and reports weren't encouraging (with the sole
exception of the Kibana web site's marketing pitch). But ES has come a long
way since six months ago, and the ELK stack is much more closely integrated.
The Splunk UI is carefully crafted to isolate users from each other and
prevent external (to the Splunk db itself, not to our company) users from
causing harm to data. But Kibana seems to be meant for a small cadre of
trusted users. What if I write a dashboard with the same name as someone
else's? Kibana doesn't even begin to discuss user isolation. But I am
confident that it will.
How can I tell Kibana to set the default Lucene query operator to AND
instead of OR. Google is not my friend: I keep getting references to the
Ruby versions of Kibana; that's ancient history by now. Kibana is cool and
promising, but it has a long way to go for deployment to all of the folks
in our company who currently have access to Splunk.
Logstash has a nice book that's been very helpful, and logstash itself
has been an excellent tool for prototyping. The book has been invaluable in
helping me extract dates from log events and handling all of our different
multiline events. But it still doesn't explain why the date filter needs a
different array of matching strings to get the date that the grok filter
has already matched and isolated. And recommendations to avoid the
elasticsearch_http output and use elasticsearch (via the Node client)
directly contradict the fact that logstash's 1.1.1 version of the ES client
library is not compatible with the most recent 1.2.1 version of ES.
And logstash is also a resource hog, so we eventually plan to replace it
with Perl and Apache Flume (already in use) and pipe it into my Java bulk
load tool (which is always kept up-to-date with the versions of ES we
deploy!!). Because we send the data via Flume to our data warehouse, any
losses in ES will be annoying but won't be catastrophic. And the front-end
following of rotated log files will be done using the GNU tail -F command
and option. This GNU tail command with its uppercase -F option follows
rotated log files perfectly. I doubt that logstash can do the same, and we
currently see that neither can Splunk (so we sporadically lose log events
in Splunk too). So GNU tail -F piped into logstash with the stdin filter
works perfectly in my evaluation setup and will likely form the first stage
of any log forwarder we end up deploying,
Brian
On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.
We tried Elasticsearch and we were able to get the same performance
with the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.
I donĀ“t recommend ELK for a critical production system, for just dev
work, it is ok, if you donĀ“t mind the hassle of setting up and operating
it. The costs you save by not buying a splunk license you have to invest
into consultants to get it up and running. Our dev teams hate Elasticsearch
and prefer Splunk.
On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.
We tried Elasticsearch and we were able to get the same performance
with the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.
I donĀ“t recommend ELK for a critical production system, for just dev
work, it is ok, if you donĀ“t mind the hassle of setting up and operating
it. The costs you save by not buying a splunk license you have to invest
into consultants to get it up and running. Our dev teams hate Elasticsearch
and prefer Splunk.
Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:
That's a lot of data! I don't know of any installations that big but
someone else might.
What sort of infrastructure are you running splunk on now, what's your
current and expected retention?
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 19 April 2014 07:33, Frank Flynn faultle...@gmail.com wrote:
We have a large Splunk instance. We load about 1.25 Tb of logs a
day. We have about 1,300 loaders (servers that collect and load logs -
they may do other things too).
As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on very similar
hardware? More? or Less?
Sure it depends on exactly what we're doing, the exact queries and
the frequency we'd run them but I'm trying to get any kind of idea before
we start.
Are there any white papers or other documents about switching? It
seems an obvious choice but I can only find very little performance
comparisons (I did see that Elasticsearch just hired "the former VP of
Products at Splunk, Gaurav Gupta" - but there were few numbers in that
article either).
Thanks,
Frank
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6441b278-39ad-417d-98a6-d6e131895634%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6441b278-39ad-417d-98a6-d6e131895634%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.