I want to migrate the data from SQL to Elasticsearch, but looking for some help.
I am having 4 tables for each client and I am having 500+ such clients. How Can I achieve such design in Elasticsearch?
I want to migrate the data from SQL to Elasticsearch, but looking for some help.
I am having 4 tables for each client and I am having 500+ such clients. How Can I achieve such design in Elasticsearch?
Short answer Yes.
You'll need logstash. Look up JDBC SQL.
It's not a light read and will require trial and error. But what your after can be done.
Now when you mean clients is it just devices/users or separate customers? Makes a world of difference on the answer.
I think this link show the Elasticsearch SQL usage. But Actually What I am looking is architecture design of Elasticsearch. Let me re-brief my question. I want to design the structure in SQL, I would just create the database for each customer and that database will have tables. But In Elasticsearch, concept like table is deprecated, so now If I want to achieve such design, how would I?
Index = Table essentially. Someone from Elastic will chime in that scale is well beyond what I've dealt with.
But my thinking would be something like:
Index per customer separated in kibana by spaces comes to mind. Not sure if 500+ would scale well would need someone else to chime in that has a use case like that. Either way to keep each customer data separate you will want an index per which is essentially a table per. You would need to factor in data retention time + growth.
You could potentially do it by the spaces based on geo location "West/East". 500+ sounds like your in a management company. If that's the case you could do the spaces broken out into those parent levels then use access control to a single index for the users. Just spit balling in that case.
It would really come down to the basic who, what, when, where which would be the same question as you would setting up a shared SQL cluster for a large company.
Who: Is it a group at each location that needs it? Will there be cross site access?
What: What type of data are you storing and for what length of time? This is the critical one to know.
When: Will you have data feeding 24x7 or will it be task/sp jobs? Its it manually entry API isn't friendly for front line users.
Where: Not really relevant but on-prim vs hosted.
Can elastic do it. Yes. it's going to access control mostly. Knowing the data will be key to say yes/no.
Welcome to the community.
There are a lot of design considerations for multi-tenancy
Here is a recent related discussion that might help.
Are the tables of different customers similar in structure or completely different?
As Elasticsearch is not a relational data store and instead document-based, you generally need to restructure your data when moving from a relational database. One way is to completely or at least partially denormalize into documents. Here are a related thread which may provide some insight: Convert relational schema to elasticsearch mapping
Yes, Christian, Tables of different customers are having the same structure. As well as data in those tables are denormalized too.
Thanks, Stephen for the warm welcome.
@stephenb and @PublicName,
Yes, I was considering having an index per customer and merging both tables data in that index and use some tag field to identify the respective table but the problem with that is, For one table I am having around 30+ fields/attributes while for other only 10. So there would be lots of null value when I have details only for another table.
Hope I am clear enough, otherwise, let me know.
Curious about how many denormalized rows / customer documents do you estimate.
Unless it is an extreme.number the fact that some customers have 30 fields and others have 10 should not really be an issue as Elasticsearch handles sparse data well on the storage side.
On the query side Elasticsearch just ignores the documents with the missing fields when you are doing a query.. those documents would not be considered and not returned, the query would not error out.
To be clear if you query on common fields that exist in both types of customers all the documents that match with both 30 and 10 fields will be returned
Thank you @stephenb. I am working on the design based on your comments. Will update you on the conclusion.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.