Hi all, I need help in my first project. Silly newbie questions, but I feel stuck right now.
First, let me say that I've been programming for more than 30 years, and in particular building websites with PHP since the late '90s so I hope you allow me to make the question.
Second, let me say that I have fully read this article https://www.elastic.co/blog/index-vs-type and while I understand it, I don't have the fluency to decide what to do.
I already know this is a recurrent question. I know many newbies ask about this. Even the article mentions this. So let me present it in another way: I'm now asking not about "how to do it well" but about "how can I deal with my fear".
So here I go:
I'm building a new company, a travel agency. I will have many different no-SQL documents. Some of them will be quotations for customers. Some will be the customers themselves. Some the real reservations. Some will be related to the domain events (for example a domain-event pool) and it's counterpart the application events (the http-requests that originated those domain events).
I have not used elasticsearch before.
While I feel I have just collections of documents, I feel that some of them are like "related" to the others.
For example: domain events (like "a user paid X amount for this reservation") and application events (like "user X clicked that button") are all them events after all.
I don't know if I should have something like
or I should have them in different indexes, like
but then I don't know what type should go in the ??? space. Any dummy word?
Or for example... what about quotations and reservations? All them share they "are for a specific customer"... say "customer documents"...
Should it be
or should they be indexes by themselves?
I'm a bit confused of if "business concept" maps to an index or to a type within an index.
Even, I'm confused if I should have like all my company indexed in a single index and then have
all in the same "big container".
I also wonder if I can "nest" types and I wonder if it makes any sense... like having namespaces...
The first, being quotations I receive from my providers, the second, being the quotations I make for my customers.
I ALREADY know this has not any easy answer... According to the mentioned article it is probable that it all "depends"... so I want to formulate the question more oriented to calm my fear than oriented to have a good generic solution:
a) We are an startup and RIGHT NOW we are NOT going to have thousands of documents per second, but I have currently thousands of visits per day (not per hour, not per minute, not per second),
b) We already are operating and we are selling, so I'm in production now,
c) I am currently using a MySQL to store the JSONs in a TEXT field. I just am unable to analyze the data, but I collect all now,
d) A MySQL dump shows me that the size of "dumped plain text data" I'm managing at this moment is about 1/2GB of new data every month (so still manageable),
e) The search on domain documents (payments, trips, quotations, and so) is under a thousand documents, so at this point of time, efficiency is not a matter.
f) The search on application documents (for example website visits, sources of traffic, etc) is under a million documents and the most I want to know today is "where is my traffic coming from" so the most I do is to seek for "all documents containing a certain tracking token (cookie or so)"
given all this... my question:
DOES IT MATTER if I make a "mistake" in the decission of what is an index and what is a type, because given the low-amount-of-data I'll be able to re-write all if needed by dumping all the ElasticSearch-badly-structured-DB and re-writing another ElasticSearch-well-structured-DB just placing the documents in the proper places?
Or if I make an error in this kind of decission it will be rather difficult to re-organize it all at a later point in time?
So... the thing is... I tend to be very purist at deciding things of the code and systems. And because lack of knowledge in the ElasticSearch structure, I am REALLY TERRIFIED about creating something that I will not be able to handle in a few time from now.
All your inputs about how to "avoid fear" will be much appreciated.
As a side-question, any suggestion on the structure I should use, is also welcome