Is Elasticsearch as a primary database better than Oracle?

Hi,
I am a student and my project is about Elastic Stack.
the company I work for is currently using oracle for managing its data.
they want assess Elasticsearch as a replacement.
according to my project success they will consider to transform from Oracle to Elasticsearch.

  1. Does Elasticsearch can replace oracle?
  2. Elasticsearch search performance is faster?
  3. Is it reliable as a primary database?
  4. What are the main advantages of Elasticsearch over oracle? how can I demonstrate them in my code?

please I would love to know the answers

You need to read this page: https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html

Have also a look at this "live coding" recording.

1 Like

thanks for answering:)

So in that point, I realize that Oracle can't be replaced by Elasticsearch as a primary database. It is not reliable enough as Oracle.
and the company I work for can use Elasticsearch in addition to Oracle db but not instead of it.
am I right?

I couldn't understand from what I read if Elastic is better than Oracle in search performance.
In Oracle I can define field as an Index to maximize search capabilities and in addition, Oracle makes parallel search on multiple servers as Elastic does. my data is numeric and Elastic is a textfull search engine. can Elasticsearch be better in search performance than Oracle? and why?

please..

What is your definition of better search performance? What is the use case?

I would expect the answer to depend a lot on the type of data and searches you have as well as the total data volume you need to search. The number of expected concurrent searches together with latency requirements will also be important factors to consider.

In a real world use case with large data volumes and high load, cost will probably also be a major factor to consider as Oracle and Elasticsearch scale very differently.

better search performance means faster search.

I work for a company and they have all the resources that I need.
my source for data is an Oracle database.
volume of data is about billions records.

my use case:

lets say I have father with thousands of children.
In Oracle I have one table for all the fathers and another table for all the children.
the keys let me relate the father with its children.
with five keys in the father record I can find all father's children in the table.
In Oracle each key is defined as index.
any query is about one father's children each time. it is a very important detail.
I have just two numeric fields. the rest of data's type is numeric.

In Elasticsearch I decided that there will be one index for each father's children.
and because of the drawbacks of join I decided that it's better for me that there will be only one type of record(document) that includes one child's data and its father's data.
so I have duplicate data of the father, each father's data shown with a multiplier of the number of its children but I don't need to use join. I prefer to add servers but waiting more for getting the search results.
so each father's children will be in its own index and each children will be routed to its shard according to its teacher id routing value.
so I have 40 fields in each document.
there is no inner objects.

I understood that Elasticsearch make best performance with flat data structure

In kibana I visualize all indices together.

I need to do benchmarking and compare between Oracle and Elastic search capabilities
my desire is that Elasticsearch search results will be much better then Oracle's
but because I am a student, I have no time to try all the scenarios.

The max number of expected concurrent searches together is about 20
latency just need to be much smaller than the latency of Oracle

all my queries are simple filter queries, I don't need to score the results.
so a query in Oracle\Elastic can be for example:
give me all the children of father x that their Hemoglobin level is between y and z.
give me all the children of father x that their Hemoglobin level is y.
volume of data I need to search is about dozens of thousands records each time.

please help me to understand if I organized my data correctly so it can win Oracle in the benchmarking
I have to know if I need to change something to make Elastic do the best before I present my project to my directors.

It sounds like you have a reasonably small data set that easily fits on a single node and the query concurrency is also not very high. It sounds like you have done the right thing by flattening and storing the parents data with each child, but I do not understand why you need more than a single index. The ideal number of shards will depend on your hardware as well as the size of the documents.

As you are not using any of the more advanced search features that sets Elasticsearch apart from Oracle I would expect Oracle to be able to handle this type of load quite well.

because there will be billions of records in my db
when I search for one father's children data I don't want Elastic will waste its time when searching among all the data that most of it is clearly not relevant.

generally, each query interested only in part of father's children's data or all of it.
if so, I can lead Elastic to search only in one index that has the relevant data (dozen of thousands of record instead of billions of records).
after what I said above, do u think I need one index?

for now, with one father in my elastic data, on one server I can see that I get elastic search results (520000 records when each record have 40 fields) in one minute or more.
it is not fast enough:(

so you think that in my case Oracle will be better when will do benchmarking?

A single index can have more than one primary shard, and the ideal number will depend on your data. If you are always searching related to a single parent you can also benefit from using routing.

I need to understand why creating 1 index for whole my data will be better than make the partition for multiple indices.

please look at that:


It was taken from the next link:


(part 1: https://www.elastic.co/blog/how-kenna-security-speeds-up-elasticsearch-indexing-at-scale-part-1)

I read about it in that blog in part 2
why is this ok for them and not for me
if I create one index and use routing then each father's children will be in the same shard and because of that in the same node, there will be no parallel search and I will get the same search results of 1 minute or worse..

please tell me why I'm wrong

if I have multiple indices, each index for one family, I could search on one index but in less amount of data and in parallel because Index will have more than one shard..

In Elasticsearch querying one index with N shards each is equivalent to querying N indices each with one shard. In order to search in parallel it is useful to have multiple shards. In your case you stated you expected to have 20 concurrent queries that will also run in parallel. In order to make your querying as efficient as possible you want each query to target as as little data as possible.

If you only had a single concurrent query, having same number of shards as you have cores would be a good starting point unless this leads to too big shards. If your 20 concurrent queries is more than the number of cores you have, you want each query to be as efficient as possible and this is where routing comes in (assuming that you often or always filter on some fixed criteria that allows you to rout data based on this). If you e.g. have 10 primary shards, each query only need one shard with 10% of the data. The size of each shard will determine the latency, but you will be able to efficiently handle concurrent queries.

Thanks a lot😊

Next year I think that there will be a lot more users (2000)and the number of concurrent queries will grow too. the records in the database is really important for the company. What will happen then? I don’t think that one index can handle so much queries and I want to avoid of reindexing my data

In that case creating index for each father’s children is better idea than create one index for all my data?
I only ask if the number of concurrent queries is the main reason of your recommendation to create one index for all the data

And if I think about the future and choose to create multiple indices for it do you think that it can affect badly on the search speed ?
how much shards do you think that I need for each one of the indices (Amount: dozen of thousands records)

I’m sorry I have a lots of questions
Please , this is very important for me :pray:t2:

With your approach, how many indices would you be creating? How many indices will each search query need to access? Have you determined how much space your data will take up on disk? Are your searches always going to target documents related to one specific parent?

Hi ,

Sorry for the delay
I’m currently during my university test period

My system will require the following requirements:

  1. The data we need to store in the system will be in magnitude of dozens of Terra Bytes
  2. There will be no need for updating data at all
  3. The number of fathers we need to store will be in magnitude of dozens of thousand and so the indices
  4. User can ask for multiple father’s children
    So each search query should access to each index separately and to all the indices together if he wants to
    So my searches are not going to target documents for one specific father all the time
    Just part of them
  5. There will be at most 30 users that make some searches at the same time

Given this data , what do u think about my use case?
That all the data need to be together in one index?

Thanks a lot :pray:t2::slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.