You can use source filtering to include or exclude fields from the result.
For instance, let's say I have an index my_published_stories and want to loop over all but just fetch the publishing date (pubdate) and processing time, then I'd do something like this:
GET my_published_stories/_search
{
"_source": ["pubdate", "processing_time"]
}
I put some timing in my scripts to measure what time I was waiting on the scroll vs. when I was processing the data. The waiting time includes network time, but it let me know whether to improve my search or my process.
Powershell was MUCH slower than the Python Elasticsearch DSL, like orders of magnitude slower.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.