Print all documents - Python API


(Inancarin) #1

Hi guys,

I have 329998 documents in my twitter index where I store tweet documents inside it. I want to print texts of all tweets into a txt file. Although, I can find the total number of documents correctly (329998), I am only able to write 19 of them into the file.

I think, I should look for the helpers.scan, however I couldn't figure it out. Here is my code:

from elasticsearch import Elasticsearch
import codecs

out = codecs.open("/.../output.txt", encoding='utf-8', mode='w+')

es = Elasticsearch()
res = es.search(index="twitter", body={"query": {"match_all": {}}})
print("%d documents found" % res['hits']['total'])

for doc in res['hits']['hits']:
	text = doc['_source']['text'].lower().replace('\n', ' ').replace('\r', '').replace('\t', ' ')
	out.write(text + "\n")
    	
out.close()

By the way, I am using ElasticSearch 5.0.0

Thanks,
Inanc


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.