Some questions about Wikipedia river


(Pınar Yanardağ) #1

Hi,

I am using elasticsearch 0.19.4 and I've set up a wikipedia river 1 on
top of that. I have three questions (sorry to pack them up in one thread):

  1. the wikipedia dump is around 8GB -unzipped. But my data folder after
    indexing wikipedia is around 7GB. I assume that there was an interrupt
    during the indexing process, so it is not complete yet. Is there a way to
    make elasticsearch to continue indexing?

  2. I am using elasticsearch to query tweets. I am experimenting if it's
    available to get some topic assignments from the search engine by doing
    this. I am currently using pyelasticsearch 2 Python library and using
    basic search function. Do you think of any better search type that might be
    useful to search queries like the following?

res = conn.search("Donna Karan To Attend Haiti Charity Fundraiser")

Top 10 results and their scores:

Donna Karan 0.43873835
Haiti/Government 0.1892942
Haiti/People 0.1892942
Haiti/Transportation 0.1892942
Donna Summers 0.18910009
Haiti/History 0.1801775
Haiti/Communications 0.1801775
Haitian music 0.1600614
ISO 3166-1:HT 0.15908843
Haiti/Transnational issues 0.15862915

  1. Does elasticsearch supports query rewriting/pre-processing? If I search
    for "Obama" and "Obama's", the results are quite different.

Thanks,
Pinar


(system) #2