Building an ERP with Elasticsearch. Am I crazy?

Hi,

First I would like to thanks all of you for Elastic. I am thinking in use
it in a ERP that I am building. What do you think about this? Am I crazy?

Has someone face this? I really don't think that I am comfy enough to do
this, change the problems that I already know, for new problems that I
really don't know how to deal.

I believe that nosql will prevail over traditional sql, but I don't know if
I am ready to this task.

So how you think that I should integrate (or not) postgresql with
ELASTICSEARCH?

Thanks again,

rsw1981

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dfe531a6-675c-4fd6-b6c7-881ff6c00a97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Tuesday, August 26, 2014 6:46:12 AM UTC+8, Raphael Waldmann wrote:

Hi,

First I would like to thanks all of you for Elastic. I am thinking in use
it in a ERP that I am building. What do you think about this? Am I crazy?

Has someone face this? I really don't think that I am comfy enough to do
this, change the problems that I already know, for new problems that I
really don't know how to deal.

I believe that nosql will prevail over traditional sql, but I don't know
if I am ready to this task.

So how you think that I should integrate (or not) postgresql with
ELASTICSEARCH?

Will you plan t use ES to index data in postgresql?

I have similar idea, want to use ES instead datawarehouse.

Some problems I can see:

  1. Data in RDBMS are stored in tables, connected with relationship. You
    can use very complex sql to query a complex result, how to do in ES?
  2. If your want to run some analyse algorithms with exist data, how to
    running in ES?
  3. if your data are enough big, search one keyword in '_all' field, ES will
    be slow?

Thanks.
-Terrs

Thanks again,

rsw1981

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

In general use elasticsearch only as a secondary index. Have a copy of data
somewhere else which is more reliable. Elasticsearch often runs into index
corruption issues which are hard to resolve.

On Mon, Aug 25, 2014 at 9:30 PM, xiehaiwei@gmail.com wrote:

On Tuesday, August 26, 2014 6:46:12 AM UTC+8, Raphael Waldmann wrote:

Hi,

First I would like to thanks all of you for Elastic. I am thinking in use
it in a ERP that I am building. What do you think about this? Am I crazy?

Has someone face this? I really don't think that I am comfy enough to do
this, change the problems that I already know, for new problems that I
really don't know how to deal.

I believe that nosql will prevail over traditional sql, but I don't know
if I am ready to this task.

So how you think that I should integrate (or not) postgresql with
ELASTICSEARCH?

Will you plan t use ES to index data in postgresql?

I have similar idea, want to use ES instead datawarehouse.

Some problems I can see:

  1. Data in RDBMS are stored in tables, connected with relationship. You
    can use very complex sql to query a complex result, how to do in ES?
  2. If your want to run some analyse algorithms with exist data, how to
    running in ES?
  3. if your data are enough big, search one keyword in '_all' field, ES
    will be slow?

Thanks.
-Terrs

Thanks again,

rsw1981

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqoEZHchXCU7p%3DNt9FtibQyCiWwU6nt9YQNUATiOspMVQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

On Tuesday, August 26, 2014 12:55:10 PM UTC+8, Mo wrote:

In general use elasticsearch only as a secondary index. Have a copy of
data somewhere else which is more reliable. Elasticsearch often runs into
index corruption issues which are hard to resolve.

Our client have enough of customized display UI of data in DWH, they want
a laconic and unified UI like google, can search any thing in databse,
not only strings but also number and some analyse result.

In a word, they don't want to invest on sql and UI related.

So, I want find a new framework to throw away sql or sql liked query
language, new way to decribe relationship of table in ES. Dead way?

On Mon, Aug 25, 2014 at 9:30 PM, <xieh...@gmail.com <javascript:>> wrote:

On Tuesday, August 26, 2014 6:46:12 AM UTC+8, Raphael Waldmann wrote:

Hi,

First I would like to thanks all of you for Elastic. I am thinking in
use it in a ERP that I am building. What do you think about this? Am I
crazy?

Has someone face this? I really don't think that I am comfy enough to do
this, change the problems that I already know, for new problems that I
really don't know how to deal.

I believe that nosql will prevail over traditional sql, but I don't know
if I am ready to this task.

So how you think that I should integrate (or not) postgresql with
ELASTICSEARCH?

Will you plan t use ES to index data in postgresql?

I have similar idea, want to use ES instead datawarehouse.

Some problems I can see:

  1. Data in RDBMS are stored in tables, connected with relationship. You
    can use very complex sql to query a complex result, how to do in ES?
  2. If your want to run some analyse algorithms with exist data, how to
    running in ES?
  3. if your data are enough big, search one keyword in '_all' field, ES
    will be slow?

Thanks.
-Terrs

Thanks again,

rsw1981

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d5845d0-6e01-4555-b5ba-4222f6a28b0a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

This is the generally accepted dogma and it has some merit. However, having
two storage systems is more than a bit annoying. If you are aware of the
limitations and caveats, elasticsearch is actually a perfectly good
document store that happens to have a deeply integrated querying engine.
This is useful since most solutions involving a secondary store involve
solutions that have a much less capable querying engine and additional
latency + architectural complexity related to pumping around data to
Elasticsearch.

Elasticsearch crud operations are atomic. I.e. you can read your own writes
across the cluster. If you use the version attribute during updates, you
can detect version conflicts and prevent overwriting updates with stale
data as well. This is a similar model that you would find in e.g. couchdb
and similar document stores. There are not that many sharded and
replicated, horizontally scalable document stores out there and even fewer
with decent querying ability.

The caveat is that elasticsearch is not as battle tested as other solutions
in this space and that various people have shown that ways exist to cause
an Elasticsearch cluster to lose data, to corrupt data, etc. So, you need
to be prepared to be able to recover from such situations. That means you
need backups (e.g. use the snapshots feature) and a plan for when things go
bad.

The flip side is that other solutions have issues as well. Postgresql
clustering is brand new and probably has issues and if you use it in non
clustered mode, the failure scenarios get even more interesting. I use
Mariadb Galera cluster and it sucks big time and it needs a lot of
handholding during upgrades. Couchdb doesn't shard and shares server
failure scenarios with elasticsearch. Mongodb and cassandra each have had
their share of issues related to data corruption and data loss in the
recent past and both have recently fixed major issues related to that. So,
there are lots of solutions out there and none of them are perfect.

Elasticsearch has several major areas where it needs improvement (and which
are indeed being worked on in recent versions):

  1. it has many ways it can run out of memory. If you skim through the
    release notes of recent versions, you'll see a lot of fixes related to that
    including the use of e.g. circuit breakers. The problem with OOM's is that
    it can cause a cascading cluster failure where one node becomes slow,
    eventually drops out of the cluster and then other nodes start having the
    same issues. I've personally seen Kibana kill our cluster on two occasions.
    In both cases the logs of all nodes were full of OOM's and the cluster died
    while simply clicking through different dashboards in Kibana. This has not
    happened with the current 1.3.x version (yet) but that doesn't mean it is
    impossible.
  2. split brain situations when a quorum is lost but not detected are fairly
    easy to trigger. Every time I do a rolling update, the cluster takes
    several seconds to catch up with fact that I'm shutting down nodes. I have
    a three node cluster. One node goes down, means my cluster should be
    yellow. Two nodes down means red and it should no longer accept writes. The
    problem is that during those few seconds, the cluster status may not
    reflect reality and nodes may in fact be accepting writes when they
    shouldn't.
  3. A full cluster restart needs a lot of handholding. The reason for this
    is that most of the failure scenarios relate to there not being a quorum
    and detecting that. For example, if you simply restart the nodes one by one
    quickly you will easily get your cluster in a red state where it should no
    longer be accepting writes. The problem as described above is that
    detecting this relies on timeouts and there may be some nodes that continue
    to write for a few seconds after they should have stopped doing that. By
    the time your cluster goes red, it's too late and you are going to have to
    manually decide which shards you want to loose. That's why you need to keep
    an eye on cluster status during rolling updates. Imagine somebody power
    cycling your Elasticsearch node cluster or worse, rebooting the switch
    that connects your nodes.
  4. Elasticsearch under load may throw 503s occasionally. I've seen this
    happen on our test infrastructure a couple of times and it worries me. This
    is not something you want to see when you are writing customer data.

Mitigation for these issues typically involves using specialized nodes for
read and write traffic and cluster management. Additionally, you need to
heavily tweak things to make certain failure scenarios less likely. Out of
the box, there is a lot of stuff that can go wrong.

We're actually deprecating our mariadb architecture and switching to an
elasticsearch only architecture. I'm well aware that I'm taking a risk here
and I have a backup plan for most of those risks. This includes changing
plans and switching to couchdb or a similar document store if elasticsearch
proves to not be not up to the task. However, so far so good.

On Tuesday, August 26, 2014 6:55:10 AM UTC+2, Mo wrote:

In general use elasticsearch only as a secondary index. Have a copy of
data somewhere else which is more reliable. Elasticsearch often runs into
index corruption issues which are hard to resolve.

On Mon, Aug 25, 2014 at 9:30 PM, <xieh...@gmail.com <javascript:>> wrote:

On Tuesday, August 26, 2014 6:46:12 AM UTC+8, Raphael Waldmann wrote:

Hi,

First I would like to thanks all of you for Elastic. I am thinking in
use it in a ERP that I am building. What do you think about this? Am I
crazy?

Has someone face this? I really don't think that I am comfy enough to do
this, change the problems that I already know, for new problems that I
really don't know how to deal.

I believe that nosql will prevail over traditional sql, but I don't know
if I am ready to this task.

So how you think that I should integrate (or not) postgresql with
ELASTICSEARCH?

Will you plan t use ES to index data in postgresql?

I have similar idea, want to use ES instead datawarehouse.

Some problems I can see:

  1. Data in RDBMS are stored in tables, connected with relationship. You
    can use very complex sql to query a complex result, how to do in ES?
  2. If your want to run some analyse algorithms with exist data, how to
    running in ES?
  3. if your data are enough big, search one keyword in '_all' field, ES
    will be slow?

Thanks.
-Terrs

Thanks again,

rsw1981

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/390d6744-cfe9-4068-b3dd-fc8337355ee2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mohit Anchlia,

How do you sync ES with your main DB?

That's what I'm thinking for my project because I don't have much
experience with ES.

Thanks
On Aug 26, 2014 1:55 AM, "Mohit Anchlia" mohitanchlia@gmail.com wrote:

In general use elasticsearch only as a secondary index. Have a copy of
data somewhere else which is more reliable. Elasticsearch often runs into
index corruption issues which are hard to resolve.

On Mon, Aug 25, 2014 at 9:30 PM, xiehaiwei@gmail.com wrote:

On Tuesday, August 26, 2014 6:46:12 AM UTC+8, Raphael Waldmann wrote:

Hi,

First I would like to thanks all of you for Elastic. I am thinking in
use it in a ERP that I am building. What do you think about this? Am I
crazy?

Has someone face this? I really don't think that I am comfy enough to do
this, change the problems that I already know, for new problems that I
really don't know how to deal.

I believe that nosql will prevail over traditional sql, but I don't know
if I am ready to this task.

So how you think that I should integrate (or not) postgresql with
ELASTICSEARCH?

Will you plan t use ES to index data in postgresql?

I have similar idea, want to use ES instead datawarehouse.

Some problems I can see:

  1. Data in RDBMS are stored in tables, connected with relationship. You
    can use very complex sql to query a complex result, how to do in ES?
  2. If your want to run some analyse algorithms with exist data, how to
    running in ES?
  3. if your data are enough big, search one keyword in '_all' field, ES
    will be slow?

Thanks.
-Terrs

Thanks again,

rsw1981

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yHVPWNXxgys/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqoEZHchXCU7p%3DNt9FtibQyCiWwU6nt9YQNUATiOspMVQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqoEZHchXCU7p%3DNt9FtibQyCiWwU6nt9YQNUATiOspMVQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHMXrw5BH-OA2BqWmUWOt2HyB-3tZEiw3cwJ%3D1U9aaucMTk-Tg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I am reading a lot studying what is the best aproach fo this.

My main question can be resumed in two points

If I choose ES to index my postgresql. What's the best way to do that?

I need cluster? The most problems that I read about was related to that. If
this is true and I can run in one node should I do that?

Thanks for share your experience.

Have a nice day

Raphael Waldmann

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHMXrw52OShKM0snMxtHy-rSPEvscNQeoUurbR8uqp_x0%2BPZtA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jilles,

How goes the migration to ES? We are actually running into a situation
where the circuit breakers are not stopping our OOM problem and I was
wondering if you had experimented with them enough to help us investigate.
Can you help? The little background I can give you without posting the log
file is that it seems like a large query comes in and one node gets an OOM
while the other nodes trigger the circuit breakers. It would be great if
the OOM node would come back up and not bring down our cluster however it
is. I will post the log file if you need it.... These seem more like single
query (not single api request) circuit breakers which doesn't help much. Is
there anything I can do on each node to ensure that we avoid OOMs?

We have 3 masters and 18 data nodes.

Thanks,
Will

On Tuesday, August 26, 2014 at 4:15:20 AM UTC-6, Jilles van Gurp wrote:

This is the generally accepted dogma and it has some merit. However,
having two storage systems is more than a bit annoying. If you are aware of
the limitations and caveats, elasticsearch is actually a perfectly good
document store that happens to have a deeply integrated querying engine.
This is useful since most solutions involving a secondary store involve
solutions that have a much less capable querying engine and additional
latency + architectural complexity related to pumping around data to
Elasticsearch.

Elasticsearch crud operations are atomic. I.e. you can read your own
writes across the cluster. If you use the version attribute during updates,
you can detect version conflicts and prevent overwriting updates with stale
data as well. This is a similar model that you would find in e.g. couchdb
and similar document stores. There are not that many sharded and
replicated, horizontally scalable document stores out there and even fewer
with decent querying ability.

The caveat is that elasticsearch is not as battle tested as other
solutions in this space and that various people have shown that ways exist
to cause an Elasticsearch cluster to lose data, to corrupt data, etc. So,
you need to be prepared to be able to recover from such situations. That
means you need backups (e.g. use the snapshots feature) and a plan for when
things go bad.

The flip side is that other solutions have issues as well. Postgresql
clustering is brand new and probably has issues and if you use it in non
clustered mode, the failure scenarios get even more interesting. I use
Mariadb Galera cluster and it sucks big time and it needs a lot of
handholding during upgrades. Couchdb doesn't shard and shares server
failure scenarios with elasticsearch. Mongodb and cassandra each have had
their share of issues related to data corruption and data loss in the
recent past and both have recently fixed major issues related to that. So,
there are lots of solutions out there and none of them are perfect.

Elasticsearch has several major areas where it needs improvement (and
which are indeed being worked on in recent versions):

  1. it has many ways it can run out of memory. If you skim through the
    release notes of recent versions, you'll see a lot of fixes related to that
    including the use of e.g. circuit breakers. The problem with OOM's is that
    it can cause a cascading cluster failure where one node becomes slow,
    eventually drops out of the cluster and then other nodes start having the
    same issues. I've personally seen Kibana kill our cluster on two occasions.
    In both cases the logs of all nodes were full of OOM's and the cluster died
    while simply clicking through different dashboards in Kibana. This has not
    happened with the current 1.3.x version (yet) but that doesn't mean it is
    impossible.
  2. split brain situations when a quorum is lost but not detected are
    fairly easy to trigger. Every time I do a rolling update, the cluster takes
    several seconds to catch up with fact that I'm shutting down nodes. I have
    a three node cluster. One node goes down, means my cluster should be
    yellow. Two nodes down means red and it should no longer accept writes. The
    problem is that during those few seconds, the cluster status may not
    reflect reality and nodes may in fact be accepting writes when they
    shouldn't.
  3. A full cluster restart needs a lot of handholding. The reason for this
    is that most of the failure scenarios relate to there not being a quorum
    and detecting that. For example, if you simply restart the nodes one by one
    quickly you will easily get your cluster in a red state where it should no
    longer be accepting writes. The problem as described above is that
    detecting this relies on timeouts and there may be some nodes that continue
    to write for a few seconds after they should have stopped doing that. By
    the time your cluster goes red, it's too late and you are going to have to
    manually decide which shards you want to loose. That's why you need to keep
    an eye on cluster status during rolling updates. Imagine somebody power
    cycling your Elasticsearch node cluster or worse, rebooting the switch
    that connects your nodes.
  4. Elasticsearch under load may throw 503s occasionally. I've seen this
    happen on our test infrastructure a couple of times and it worries me. This
    is not something you want to see when you are writing customer data.

Mitigation for these issues typically involves using specialized nodes for
read and write traffic and cluster management. Additionally, you need to
heavily tweak things to make certain failure scenarios less likely. Out of
the box, there is a lot of stuff that can go wrong.

We're actually deprecating our mariadb architecture and switching to an
elasticsearch only architecture. I'm well aware that I'm taking a risk here
and I have a backup plan for most of those risks. This includes changing
plans and switching to couchdb or a similar document store if elasticsearch
proves to not be not up to the task. However, so far so good.

On Tuesday, August 26, 2014 6:55:10 AM UTC+2, Mo wrote:

In general use elasticsearch only as a secondary index. Have a copy of
data somewhere else which is more reliable. Elasticsearch often runs into
index corruption issues which are hard to resolve.

On Mon, Aug 25, 2014 at 9:30 PM, xieh...@gmail.com wrote:

On Tuesday, August 26, 2014 6:46:12 AM UTC+8, Raphael Waldmann wrote:

Hi,

First I would like to thanks all of you for Elastic. I am thinking in
use it in a ERP that I am building. What do you think about this? Am I
crazy?

Has someone face this? I really don't think that I am comfy enough to
do this, change the problems that I already know, for new problems that I
really don't know how to deal.

I believe that nosql will prevail over traditional sql, but I don't
know if I am ready to this task.

So how you think that I should integrate (or not) postgresql with
ELASTICSEARCH?

Will you plan t use ES to index data in postgresql?

I have similar idea, want to use ES instead datawarehouse.

Some problems I can see:

  1. Data in RDBMS are stored in tables, connected with relationship. You
    can use very complex sql to query a complex result, how to do in ES?
  2. If your want to run some analyse algorithms with exist data, how to
    running in ES?
  3. if your data are enough big, search one keyword in '_all' field, ES
    will be slow?

Thanks.
-Terrs

Thanks again,

rsw1981

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d0beb7e8-5dcc-4f52-8a3d-379d2c5f538d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.