Elasticsearch as a primary storage

We have lots of users(30-200 million). Each user has N(30-100) attributes. The attribute can be of type integer, text, timestamp or counter. Schema is not defined.

We are looking for a database to which we can index and retrieve(GET API) users in the most efficient way(low response time). We also have the requirement to search for users but this is out of this topic. We also need to have strong consistent and high available data.

I already know that GET API provides consistent data, but I'm not sure if Elasticsearch is the right choice for this case. We have tried Cassandra but the read performance was low(high disk IOPS).

Please read this

You will get an idea!

Thanks DineshNaik. I have already read this answer. My question is more about performance.

Hi @xtapodi Welcome to the community and Thanks for considering Elasticsearch.

Elasticsearch when configured correctly (hardware software and index strategy) is highly performant We would need to learn a little bit about what you're trying to do.

even 200 million records with the 100 attributes each is still a relatively small data set with respect to Elasticsearch.

I know you said no schema but there will be a schema whether you set it or not and then with proper queries you should be able to query across that entire data set with highly performant low latency queries.

I would suggest setting up the a node or a cluster or better yet just try a Elastic Cloud and do some testing.

1 Like

That's nearly 5 years old. I'd encourage you to read up on resilience changes that have been made since then.

1 Like

Ha! Right. I wrote this answer a looong time ago and all the great work about sequence numbers and resilience have been done since then.

IMO the source of truth (which is always up to date) is Elasticsearch Resiliency Status | Elastic

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.