I am experimenting with many indices being created on a (currently) single node cluster.
I have a simple Junit Test that creates indices (with different names of course) inside a for-loop.
CreateIndexResponse cr = this.client.admin().indices().create(new CreateIndexRequest(name)).actionGet();
this.logger.info("create index "+name+"acknowledged: "+cr.isAcknowledged());
The client is constructed like this (only once) and reused for every request
this.client = new PreBuiltTransportClient(settings).addTransportAddress(new InetSocketTransportAddress(new InetSocketAddress("localhost", port)));
My problem is that the whole test seems to hang up.
there is no load anymore
nothing happening in my application log
nothing happening in elasticsearch log (last INFO is, that the previeous index has been created)
Am i doing something wrong? The number 309 surely is specific for my case but way more empty indices should be possible. I do not even get "Too many open files" or "Out of memory" errors.
Sidenote: Via curl i am able to easily create 100 indices with 1 document each.
for i in {0..1000}; do
echo "PUT "$i
curl -X PUT -H "Content-Type: application/json" -d '{"name":"Peter"}' http://localhost:9200/index-$i/document/1
done
FWIW I don't believe that creating 1000 indices within a loop is something realistic though.
Creating 1000 docs within an index is more something I'd expect.
I mean, what problem are you trying to cover with that test?
(The curl loop is creating documents which leads to automatic index creation)
Index creation inside a for-loop is only a test.
What i want to do is to provide a different index for each application-instance (saas application)
and therefor wanted to test out how many instances a single server can handle.
(The curl loop is creating documents which leads to automatic index creation)
I know. But you are not testing the same thing. Which is important when you want to compare results.
What i want to do is to provide a different index for each application-instance (saas application) and therefor wanted to test out how many instances a single server can handle.
That's not the way you will find this number. It only makes sense if you put documents in your indices. I mean there is a limit per shard that you need to find, then you need to compute what is the number of shards you need for a single index.
Then you need to test how many loaded shards a single machine can hold.
May be this is something you can test using es-rally BTW? @danielmitterdorfer WDYT? Would that help?
(I'm sorry we moved a bit from the original question).
I know that simply creating empty indices is not realistic. Putting documents inside the indices even decreases the max number. I just simplified the test down to index creation in order to isolate the problem.
And there has to be a problem in how i use the Java API.
I do not even reach any memory ore file handle limits.
I just ran the following against the latest master of Elasticsearch:
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import java.net.InetAddress;
public class TransportTester {
public static void main(String[] args) throws Exception {
Settings settings = Settings.builder()
.put("cluster.name", "distribution_run").build();
TransportClient client = new PreBuiltTransportClient(settings);
client.addTransportAddress(new TransportAddress(InetAddress.getByName("localhost"), 9300));
for (int i = 0; i < 10000; i++) {
String name = "idx-" + i;
CreateIndexResponse cr = client.admin().indices().create(new CreateIndexRequest(name)).actionGet();
System.out.println("Created " + cr.index());
}
client.close();
}
}
(note: I had to add the "cluster.name" setting because I ran Elasticsearch via Gradle; this is not necessary in your case).
Although it started garbage collecting heavilly after ~ 600 indices it's still running fine and has created 900 indices so far (this configuration of Elasticsearch is started with just 512MB heap).
While I think it might be doable with some trickery, creating indices dynamically is not really something that is well supported in Rally at the moment.
@Phili: If you are not sure what's going on in the application you could take thread dumps (or attach jconsole or jvisualvm) to see what's happening.
Hey @danielmitterdorfer, thank you very much for your effort!
I just tried a similar thing and had no problems creating a few hundreds indices too - even with some documents being indexed. (I ran my application-unit.test against a separately started elasticsearch)
Maybe the problem is the way my application "autostarts" elasticsearch.
(a detail i did not mention before)
On application startup (JEE application running on wildfly 10) an applicationScoped bean "SearchServerManager" starts a elasticsearch process in a configured path.
@ApplicationScoped
public class SearchServerManager {
...
@PostConstruct
public void init() {
logger.info("Initializing...");
this.logger.info("Starting search server");
ProcessBuilder pb = new ProcessBuilder(this.preference.getElasticsearchPath() + "/bin/elasticsearch");
this.logger.info("starting elasticsearch - " + pb.command());
this.esProcess = pb.start();
...
by itself I don't see anything wrong here. But the devil is in the details (as always ).
Depending on your environment (OS, distribution) Elasticsearch might inherit some limit of the parent process (which is your application). For example we got bitten by systemd default limits for the maximum number of processes in our CI environment (see https://www.elastic.co/blog/we-are-out-of-memory-systemd-process-limits).
I am not saying that this is the cause here. The point I want to make is that doing it this way can be trappy. But maybe the article I linked to gives you some ideas on what to check what's going in your situation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.