Just to help me understand, the term Join is a SQL / RDBMS construct typically with a WHERE clause that joins data from 2 disparate tables. SQL Joins are not readily supported in Elasticsearch data is typically denormalized in Elasticsearch ... is that what you want to do... if so that is a much more complex question.
In the image I have just uploaded, it can beseen tha I have 8 indeces. But I would like to use use only 2, humhub-2019.04 and humhub-2019.05.
Is there a way to filter that?
By the way, when applying Machine Learning , what is the split rate (train / test)?
First of all, Machine Learning in Elasticsearch today is specifically Unsupervised Time Series Anomaly Detection. Training refers to Supervised Machine Learning which we are not.
Typically for our Machine Learning to "Learn Data" that has periodicity it take at least 3 times the period to learn the data. (dependent on many things)
So if you have a daily pattern it will take at least 3 days, Weekly... at least 3 weeks etc.
And more data is generally better... less data will be less better
So back to the first question while setting up the ML job you could just limit the Date Range of the Data Feed that would probably be easiest / better...
Or you can do some different reindex those indexes into a single index, or reindex into 2 indexes with a different names and create an index pattern for just those 2.
You should really resolve those mapping conflicts going forward I suspect this is because you are in POC mode and loaded / iterated on your data... or you need to name the field different if they are actually different field types. I think you may need to read / learn about mappings. In essences mappings are your schema. Elasticsearch will create / guess and create a mapping for each index automatically if one has not been created beforehand. It creates based on the fist data it sees (which could be wrong) Typically once you "Know your data" you will create am index template with a mapping to make sure the data is consistent.
That conflict you show above could cause some problems at some point including if they are used in the ML Jobs
As to the the Data Feed Range that is in the Machine Learning job when you create it in the Machine Learning App, For now just select the @timestamp when creating the Index Pattern.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.