Failed ML tests when running the gradle check task against unchanged repo code

Hi,

I'm setting up a develoment environment to contribute to the elasticsearch project. I'm currently trying to run the complete test suite to ensure everything is ok before grabbing an issue to work on. I'm getting a couple of errors in org.elasticsearch.client.MachineLearningIT:

REPRODUCE WITH: ./gradlew ':client:rest-high-level:asyncIntegTestRunner' --tests "org.elasticsearch.client.MachineLearningIT.testStopDatafeed" -Dtests.seed=6D962F6EB915A9AA -Dtests.security.manager=true -Dtests.locale=sr-ME -Dtests.timezone=Etc/GMT-12 -Dcompiler.java=13

Elasticsearch exception [type=status_exception, reason=Could not open job because no ML nodes with sufficient capacity were found]
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=status_exception, reason=Could not open job because no ML nodes with sufficient capacity were found]
	at __randomizedtesting.SeedInfo.seed([6D962F6EB915A9AA:9969ABDEC8D633C2]:0)
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
	at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1877)
	at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1854)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1611)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1550)
	at org.elasticsearch.client.MachineLearningClient.openJob(MachineLearningClient.java:353)
	at org.elasticsearch.client.MachineLearningIT.openJob(MachineLearningIT.java:2531)
	at org.elasticsearch.client.MachineLearningIT.testStopDatafeed(MachineLearningIT.java:683)
...
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_state_exception, reason=Could not open job because no suitable nodes were found, allocation explanation [Not opening job [test-stop-datafeed3] on node [{asyncIntegTest-0}{ml.machine_memory=8182083584}{ml.max_open_jobs=20}], because this node has insufficient available memory. Available memory for ML [2454625075], memory required by existing jobs [2199912448], estimated memory required for this job [1084227584]]]
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
	at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
	at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:169)
	... 44 more
REPRODUCE WITH: ./gradlew ':client:rest-high-level:asyncIntegTestRunner' --tests "org.elasticsearch.client.MachineLearningIT.testStartDataFrameAnalyticsConfig" -Dtests.seed=6D962F6EB915A9AA -Dtests.security.manager=true -Dtests.locale=sr-ME -Dtests.timezone=Etc/GMT-12 -Dcompiler.java=13


Expected: <STOPPED>
     but: was <FAILED>
java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <FAILED>
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.junit.Assert.assertThat(Assert.java:923)
	at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:887)
	at org.elasticsearch.client.MachineLearningIT.testStartDataFrameAnalyticsConfig(MachineLearningIT.java:1548)
...
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <STARTED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <STARTED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <STARTED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <STARTED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <STARTED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <STARTED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <STARTED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <STARTED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <STARTED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <STARTED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <REINDEXING>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <FAILED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <FAILED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <FAILED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more
	Suppressed: java.lang.AssertionError: 
Expected: <STOPPED>
     but: was <FAILED>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.client.MachineLearningIT.lambda$testStartDataFrameAnalyticsConfig$17(MachineLearningIT.java:1548)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:875)
		... 37 more

Are there any specific requirements for them to run successfully, specially regarding memory?

I'm using Arch Linux, 6-core i7, 8GiB RAM, 8GiB swap, OpenJDK 13.0.2. I run the tests with IntelliJ IDEA CE 2020.1 and directly with gradle through the command line, with the same results. I'm also using --max-workers=1.

Thanks!

Are there any specific requirements for them to run successfully, specially regarding memory?

For the first problem you are correct that the test is failing because you only have 8GiB of RAM. This is not really intentional - it's just an oversight that we've gotten away with because all Elastic developers have at least 16GiB on their laptops and all our CI machines have at least this much too.

I think you can fix it by adding this line next to line 2545 of client/rest-high-level/src/test/java/org/elasticsearch/client/MachineLearningIT.java:

        builder.setAnalysisLimits(new AnalysisLimits(512L * 1024 * 1024, 4L));

The default model memory limit is 1GiB, but there's no need for the jobs in that test to use so much, and halving the limit will make the test work on an 8GiB machine.

For the second failure something else went wrong. The job failed unexpectedly. To find out why you'd have to look in the server side log of the integration test. This will be in client/rest-high-level/build/testclusters/asyncIntegTest-0/logs/asyncIntegTest.log after you run the REPRO command:

./gradlew ':client:rest-high-level:asyncIntegTestRunner' --tests "org.elasticsearch.client.MachineLearningIT.testStartDataFrameAnalyticsConfig" -Dtests.seed=6D962F6EB915A9AA -Dtests.security.manager=true -Dtests.locale=sr-ME -Dtests.timezone=Etc/GMT-12 -Dcompiler.java=13

To find out why the job failed unexpectedly you would have to look in that file for an exception that looks like it could be the root cause of the problem.

@droberts195 Thanks for the info! The memory limit you suggested was enough to successfully run the tests. Please note that AnalysisLimits uses MiBs for the limit, so I added:

builder.setAnalysisLimits(new AnalysisLimits(512L, 4L));

Do you think it's appropriate to submit your change to avoid this issue with 8GiB machines in the future, or would it be better to just submit a change to add a note to CONTRIBUTING.md or TESTING.asciidoc?

Besides this, after I rebased with some upstream changes, the second issue went away.

I'm currently running gradle check again to verify everything is ok.

Thanks for pointing out my mistake with bytes versus MiB @CactoT.

I have opened https://github.com/elastic/elasticsearch/pull/55210 to get the change added to the official code to help people running on machines with smaller amounts of RAM in the future.

@droberts195 Great, thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.