How to join 2 indexes and the apply Machine LearningX-Pack?

Hi

I have a list of different indexes (see image). I would like to join 2 of them and then apply the Machine Learning algorithms.

I want to join: "humhub-2019.04" with "humhub-2019.05"
I thought of saving the search and then calling each of them and using "AND".

Is there other way to join 2 indexes? do I have to use "json"?

Thank you

Hi @Rosho

Thanks for trying ML....

If you want to run the ML and or queries across multiple indexes like you are displaying you simply create and index pattern see here:

In your case you will create an index pattern like humhub-* then when you create an ML job you use the index pattern.

Just to help me understand, the term Join is a SQL / RDBMS construct typically with a WHERE clause that joins data from 2 disparate tables. SQL Joins are not readily supported in Elasticsearch data is typically denormalized in Elasticsearch ... is that what you want to do... if so that is a much more complex question.

Hi @stephenb

In the image I have just uploaded, it can beseen tha I have 8 indeces. But I would like to use use only 2, humhub-2019.04 and humhub-2019.05.
Is there a way to filter that?
By the way, when applying Machine Learning , what is the split rate (train / test)?

Thank you

First of all, Machine Learning in Elasticsearch today is specifically Unsupervised Time Series Anomaly Detection. Training refers to Supervised Machine Learning which we are not.

Typically for our Machine Learning to "Learn Data" that has periodicity it take at least 3 times the period to learn the data. (dependent on many things)

So if you have a daily pattern it will take at least 3 days, Weekly... at least 3 weeks etc.

And more data is generally better... less data will be less better :slight_smile:

This might be a good webinar.
https://www.elastic.co/webinars/time-series-anomaly-detection-optimizing-machine-learning-jobs-in-elasticsearch

And here is a good overview
https://www.elastic.co/guide/en/elastic-stack-overview/current/ml-overview.html

So back to the first question while setting up the ML job you could just limit the Date Range of the Data Feed that would probably be easiest / better...

Or you can do some different reindex those indexes into a single index, or reindex into 2 indexes with a different names and create an index pattern for just those 2.

1 Like

Hey @stephenb

I want to limit the Date Range and Data Feed. But now I am having other problem (check image 1).

Is it related to the "Time Filter field name" in Image 2?

I have used the data despite of the "mapping conflict", and it is working fine.

IMAGE 1

IMAGE 2

  1. You should really resolve those mapping conflicts going forward I suspect this is because you are in POC mode and loaded / iterated on your data... or you need to name the field different if they are actually different field types. I think you may need to read / learn about mappings. In essences mappings are your schema. Elasticsearch will create / guess and create a mapping for each index automatically if one has not been created beforehand. It creates based on the fist data it sees (which could be wrong) Typically once you "Know your data" you will create am index template with a mapping to make sure the data is consistent.
    That conflict you show above could cause some problems at some point including if they are used in the ML Jobs

  2. As to the the Data Feed Range that is in the Machine Learning job when you create it in the Machine Learning App, For now just select the @timestamp when creating the Index Pattern.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.