Hi Charlie,
Really thank you for your help.
I used only the text extracted as in the link you gave me.
I used Python client to extract the text:
import elasticsearch
import csv
import random
import unicodedata
#replace with the IP address of your Elasticsearch node
es = elasticsearch.Elasticsearch(["127.0.0.1:9200"])
Replace the following Query with your own Elastic Search Query
res = es.search(index="fichier", body=
{
"fields": [
"file"
]
}, size=10)
random.seed(1)
sample = res['hits']['hits']
#comment previous line, and un-comment next line for a random sample instead
#randomsample = random.sample(res['hits']['hits'], 5); #change int to
RANDOMLY SAMPLE a certain number of rows from your query
print("Got %d Hits:" % res['hits']['total'])
with open('mytest.tsv', 'wb') as csvfile: #set name of output file here
filewriter = csv.writer(csvfile, delimiter='\t', # we use TAB delimited,
to handle cases where freeform text may have a comma
quotechar='|', quoting=csv.QUOTE_MINIMAL)
create header row
filewriter.writerow(["id", "fields"]) #change the column labels here
for hit in sample: #switch sample to randomsample if you want a random
subset, instead of all rows
try: #try catch used to handle unstructured data, in cases where a field
may not exist for a given hit
col1 = hit["_id"]
except Exception, e:
col1 = ""
try:
col2 = hit["fields"]
col2 = col2.replace('\n', ' ')
except Exception, e:
col2 = ""
filewriter.writerow([col1,col2])
And, it works! I get all the text from the file.
Realy Charlie, thank you 
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1dfaa9a1-1933-4715-a73a-8613bb7acbd5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.