Constantly updating documents - a bad idea?


(Péter Láng) #1

Hello,

I'm not quite familiar with the ways ElasticSearch and Lucene works, but I
know that there isn't really such a thing as updating a document. "Update"
request are completed by a delete and an insert. So I assume it is "not as
efficient", say, as an SQL update. So I've been wondering if it's a bad
idea to update every single document that get indexed, exactly once. Now,
how did I even came up with that idea and why would I do that?

Well, we use the ELK stack for processing, storing and presenting logs,
nothing special here, the most typical use-case. But now, we'd like to do
some post-processing/analyzing on the logs. We'd like to "highlight" the
logs that are actually important. Meaning: Fetch not-yet analyzed
documents, do a lot of regexp matching, and add an "alert" tag if
necessary, and then push it back to ES. Now, I think I can get Logstash to
do this (with some modifications to it). But I don't know how hard would it
be on the "cluster" (only one beefy node so far, but a lot of logs), how
would it affect the performance, is it error prone and how would it scale?
Let's assume the use of the most efficient methods (scan+scroll API and
bulk API for inserts [or are there better APIs for this?]).

I'm well aware that technically I could do this before the first indexing
of the documents, but in our case I think it's a lesser architecture
design: Mixing log processing (~splitting) and analyzing is not a great
practice. The current processing mechanism/logic is rarely modified, it
works, it's solid. Now, the analyzer patterns would be changed and updated
a lot, meaning a lot of Logstash restarts. If something get's messed up
logs may not get processed well, or get lost, etc. Some of these issues
could be eliminated by chaining two Logstash instances before the ES, but
not all.

So, long story short, is it a bad idea to update every document once and
should I stick with pre-processing or is it feasible?

I'm also open to completely different approaches.

Thanks for your input in advance!

P

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4172b00d-ea52-4fee-b9c2-6d1ae7af645f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2