Right now, I need a 1-time ingestion of about 4tb of data, to make it queryable.
My largest piece of hardware is my gaming rig, 16 core, 64 gig ram, 27T disc space, etc. Sadly it is running Windows 10 Home, and I would like to retain windows as the running primary OS as I have a bunch of other things which need to be done on the machine.
I know Windows 10 Home is terrible, and is likely creating throttling issues. It seems that the docker limitations for 10 Home is that everything is done in a container called Default. Eh.
I ended up creating a Debian 9 VM instead, 32 GIG Ram, 8 Core and shared 10TB folders to the VM from the host. Sounds good so far. I set up docker and docker compose. I use your sample elasticsearch docker compose to get elastic up and running. NORMALLY id say "This is fantastic" as i know it works out of the box in all the normal scenarios.
Remember when I said i shared the 10TB from HOST to the VM? Well, my desired goal is to set that up as the volume location. Ended up updating the volumes to say:
volumes: - /media/sf_dataset/data01:/usr/share/elasticsearch/data
When I run the compose it will launch all the instances. (Sweet)
It will create the folders in the HDD, data01, data02, etc. (SWEET)
All of the Docker containers will fail because elasticsearch will get an access error??
es03 | "stacktrace": ["org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Unable to access 'path.data' (/usr/share/elasticsearch/data)", es03 | "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:163) ~[elasticsearch-7.5.2.jar:7.5.2]",
Did I do something wrong? Is my design preposterous? I am thinking that if i update to Windows 10 Pro or something, it would allow me to abstract away the Debian VM and run Elasticsearch containers natively in windows better. Windows 10 home requires me to use an outdated version of docker, enforcing containerization to be in a VM.
With Round 1 ingestion, i noticed that my volumes would run out of space. So i figured to offload the VM HDD space to one of my dedicated HDD. Maybe I Could instead mount a 3.0 External 8TB HDD to the Debian VM in order to plausibly bypass the shared folders? Since I saw that it created the folder I knew it can access that directory. Since it was running as root, I suspect that it does have the access though, and im curious as to what's going on.
I created a Stackoverflow post about it as well, pretty much covering the same thing.
EDIT Also, since I am running this and saving this in 1 HDD location, maybe it would be better to just run 1 instance of elasticsearch instead of 3-6 containers processing it?