Number of Docs Returned

All,

I am doing a search using the pyes geodistancefilter and I am only
getting the top 10 results at a time. That is by design, right? I want
to be able to retrieve all my the docs at a time and write them to a
file. There are about 48k of them. Am I missing anything from the
following? You can see that I am getting the total which is iterating
by the number of docs found but its writing the same records over and
over again. How can I get the next set of records in the series?

from pyes import ES
from pyes import GeoBoundingBoxFilter, GeoDistanceFilter,
GeoPolygonFilter, FilteredQuery, MatchAllQuery

conn = ES('localhost:9200')

def make_corpus():
try:
gq = GeoDistanceFilter("geometry.coordinates", [72, 31],
"400km")
q = FilteredQuery(MatchAllQuery(), gq)
rs = conn.search(query=q, indices=["getdata"])
hits = rs['hits']
total = hits['total']

    file = open('c:\\Temp\\corpus.txt', 'wb')

    for x in xrange(total):
        recs = hits['hits']
        for x in recs:
            source = x['_source']
            str = source['properties']['translated']
            text = str.split()
            final = ' '.join(text)
            file.writelines(final + '\n')
    file.close()

except Exception as err:
    print err

Thanks,
Adam

You need to set the "size" parameter in Search Object.

Are you using a recent version of pyes (GitHub - aparo/pyes: Python connector for ElasticSearch - the pythonic way to use ElasticSearch)?

Your code should be:

from pyes import ES
from pyes import GeoBoundingBoxFilter, GeoDistanceFilter,
GeoPolygonFilter, FilteredQuery, MatchAllQuery, Search

conn = ES('localhost:9200')

def make_corpus():
try:
gq = GeoDistanceFilter("geometry.coordinates", [72, 31],
"400km")
q = FilteredQuery(MatchAllQuery(), gq)
resultset = conn.search(Search(query=q, size=1000), indices="getdata")
file = open('c:\Temp\corpus.txt', 'wb')

   for r in resultset :
           str = r.properties.translated               
           # OR str = r.['properties']['translated']
           text = str.split()
           final = u' '.join(text)
           file.writelines(final + '\n')
   file.close()

except Exception as err:
print err

Note:
I don't know why you split and merge a string str should be always equal to final.

I suggest for the mailinglist using tags such as [pyes], [elastica], [tire] to allow users to understand info from the subject.

Hi,
Alberto

Il giorno 07/set/2011, alle ore 20:16, Adam Estrada ha scritto:

All,

I am doing a search using the pyes geodistancefilter and I am only
getting the top 10 results at a time. That is by design, right? I want
to be able to retrieve all my the docs at a time and write them to a
file. There are about 48k of them. Am I missing anything from the
following? You can see that I am getting the total which is iterating
by the number of docs found but its writing the same records over and
over again. How can I get the next set of records in the series?

from pyes import ES
from pyes import GeoBoundingBoxFilter, GeoDistanceFilter,
GeoPolygonFilter, FilteredQuery, MatchAllQuery

conn = ES('localhost:9200')

def make_corpus():
try:
gq = GeoDistanceFilter("geometry.coordinates", [72, 31],
"400km")
q = FilteredQuery(MatchAllQuery(), gq)
rs = conn.search(query=q, indices=["getdata"])
hits = rs['hits']
total = hits['total']

   file = open('c:\\Temp\\corpus.txt', 'wb')

   for x in xrange(total):
       recs = hits['hits']
       for x in recs:
           source = x['_source']
           str = source['properties']['translated']
           text = str.split()
           final = ' '.join(text)
           file.writelines(final + '\n')
   file.close()

except Exception as err:
print err

Thanks,
Adam