Datafeed query for Machine Learning, Got more than 10000 patitions

ElasticLiver · October 8, 2020, 8:13pm

Hi, Im currently using this query in the datafeed of my jobs in ML:

{
  "bool": {
    "filter": [
      {
        "bool": {
          "should": [
            {
              "exists": {
                "field": "cpu_per"
              }
            }
          ],
          "minimum_should_match": 1
        }
      },
      {
        "bool": {
          "should": [
            {
              "match_phrase": {
                "grupo.keyword": "Datacenter"
              }
            }
          ],
          "minimum_should_match": 1
        }
      }
    ]
  }

but I still get about 18.000 partitions, so I need a way to split the documents to have less than 10000 partitions, and two jobs for the same index and detector, what would be the best way to do this?

richcollier · October 9, 2020, 12:44pm

Your question needs a lot more context for us to understand. Please describe your use-case, what you want to accomplish, what the data looks like, what your proposed ML job config is, etc.

What field has 18,000 values?

system · November 6, 2020, 12:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filter Machine learning job Elasticsearch elastic-stack-machine-learning	2	457	June 12, 2020
Question on how to create a simple ML job Elasticsearch elastic-stack-machine-learning	12	1219	October 29, 2018
Increase the number of splitting data on machine learning Elasticsearch elastic-stack-machine-learning	2	600	October 30, 2018
Using elasticsearch to find duplicates in dataset Elasticsearch	7	5524	July 6, 2017
Model_bytes_memory_limit Elasticsearch elastic-stack-machine-learning	13	912	June 24, 2022

Datafeed query for Machine Learning, Got more than 10000 patitions

Related topics