Retrieving over a million records in Elasticsearch

Does this actually boost performance? I'm not in a production scenario at the moment. There isn't any load on the cluster as such.

I used a filter query, which not be doing any sorting, correct? Also, what does sorting by _doc achieve?

I used the hot_threads API while executing the query and it seems like it is the fetch_phase from the gateway node. I sampled every second to see what threads were taking time. Here is the output:

100.3% (1s out of 1s) cpu usage by thread 'name_of_elasticsearch_cluster_node][search][T#3]' 4/10 snapshots
sharing following 22 elements sun.nio.ch.NativeThread.current(Native Method)
sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:46)
sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:737)
sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
org.apache.lucene.store.DataInput.readVInt(DataInput.java:122)
org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:221)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsRe
der.java:249) org.apache.lucene.index.SegmentReader.document(SegmentReader.java:335)
org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:427)
org.elasticsearch.search.fetch.FetchPhase.createSearchHit(FetchPhase.java:219)
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:184)
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:401)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryFetchTransportHandler.messageReceived(SearchServiceTransportAction.java:833)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryFetchTransportHandler.messageReceived(SearchServiceTransportAction.java:824)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279) org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 

This was repeated across multiple threads on the gateway node. On one of the other nodes, this was the output:

99.7% (997.3ms out of 1s) cpu usage by thread 'elasticsearch[other_node_on_cluster][search][T#7]' 3/10 snapshots
sharing following 22 elements sun.nio.ch.NativeThread.current(Native Method)
sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:46)
sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:737)
sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:110)
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:354) org.apache.lucene.index.SegmentReader.document(SegmentReader.java:335)
org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:427)
org.elasticsearch.search.fetch.FetchPhase.createSearchHit(FetchPhase.java:219)
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:184)
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:401)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryFetchTransportHandler.messageReceived(SearchServiceTransportAction.java:833)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryFetchTransportHandler.messageReceived(SearchServiceTransportAction.java:824)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279) org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745) 

Is this conclusive enough to point out that it is the fetch_phase that is causing the slowness?

I do believe aggregations could potentially solve the problem, but it might be useful for us to have this data regardless. If it doesn't work, we will probably move to aggregations. Is it possible to retrieve all this data in under 1 second?