Newbie question: Fear about "Index and type structure" mistakes

Hi all, I need help in my first project. Silly newbie questions, but I feel stuck right now.

First, let me say that I've been programming for more than 30 years, and in particular building websites with PHP since the late '90s so I hope you allow me to make the question.

Second, let me say that I have fully read this article and while I understand it, I don't have the fluency to decide what to do.

I already know this is a recurrent question. I know many newbies ask about this. Even the article mentions this. So let me present it in another way: I'm now asking not about "how to do it well" but about "how can I deal with my fear".

So here I go:

I'm building a new company, a travel agency. I will have many different no-SQL documents. Some of them will be quotations for customers. Some will be the customers themselves. Some the real reservations. Some will be related to the domain events (for example a domain-event pool) and it's counterpart the application events (the http-requests that originated those domain events).

I have not used elasticsearch before.

While I feel I have just collections of documents, I feel that some of them are like "related" to the others.

For example: domain events (like "a user paid X amount for this reservation") and application events (like "user X clicked that button") are all them events after all.

I don't know if I should have something like


or I should have them in different indexes, like


but then I don't know what type should go in the ??? space. Any dummy word?

Or for example... what about quotations and reservations? All them share they "are for a specific customer"... say "customer documents"...

Should it be


or should they be indexes by themselves?


I'm a bit confused of if "business concept" maps to an index or to a type within an index.

Even, I'm confused if I should have like all my company indexed in a single index and then have


all in the same "big container".

I also wonder if I can "nest" types and I wonder if it makes any sense... like having namespaces...


The first, being quotations I receive from my providers, the second, being the quotations I make for my customers.


I ALREADY know this has not any easy answer... According to the mentioned article it is probable that it all "depends"... so I want to formulate the question more oriented to calm my fear than oriented to have a good generic solution:

Given that
a) We are an startup and RIGHT NOW we are NOT going to have thousands of documents per second, but I have currently thousands of visits per day (not per hour, not per minute, not per second),
b) We already are operating and we are selling, so I'm in production now,
c) I am currently using a MySQL to store the JSONs in a TEXT field. I just am unable to analyze the data, but I collect all now,
d) A MySQL dump shows me that the size of "dumped plain text data" I'm managing at this moment is about 1/2GB of new data every month (so still manageable),
e) The search on domain documents (payments, trips, quotations, and so) is under a thousand documents, so at this point of time, efficiency is not a matter.
f) The search on application documents (for example website visits, sources of traffic, etc) is under a million documents and the most I want to know today is "where is my traffic coming from" so the most I do is to seek for "all documents containing a certain tracking token (cookie or so)"

given all this... my question:

DOES IT MATTER if I make a "mistake" in the decission of what is an index and what is a type, because given the low-amount-of-data I'll be able to re-write all if needed by dumping all the ElasticSearch-badly-structured-DB and re-writing another ElasticSearch-well-structured-DB just placing the documents in the proper places?

Or if I make an error in this kind of decission it will be rather difficult to re-organize it all at a later point in time?

So... the thing is... I tend to be very purist at deciding things of the code and systems. And because lack of knowledge in the ElasticSearch structure, I am REALLY TERRIFIED about creating something that I will not be able to handle in a few time from now.

All your inputs about how to "avoid fear" will be much appreciated.

As a side-question, any suggestion on the structure I should use, is also welcome :wink:

Can you try to limit the size of your post to a few lines and one question for future posts? I will try to answer all questions:

  • related to having several types for one index. I would strongly recommend against it mostly because this will become deprecated in future version and also because it can bring performance issues with overheads especially in older versions of Elasticsearch (assuming you are on latest refer to first reason)... Just go for one type per index
  • the name of the type has no importance for elasticsearc
  • related to question about the importance of modelling mistakes, they will cost in performance and would require reindexing and possibly changing your code depending on what you need to correct later. However from what you said and if you have json stored elsewhere, it sounds like reindexing would not be a big issue for you... Don't forget to use alias and not directly index in your application so you can change indice(s) used without touching your code....

Hope this helps. I would recommend our core developer training course here which would probably save you a lot of time at this stage:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.