Some questions about Wikipedia river

Pinar_Yanardag · May 25, 2012, 2:16pm

Hi,

I am using elasticsearch 0.19.4 and I've set up a wikipedia river 1 on
top of that. I have three questions (sorry to pack them up in one thread):

the wikipedia dump is around 8GB -unzipped. But my data folder after
indexing wikipedia is around 7GB. I assume that there was an interrupt
during the indexing process, so it is not complete yet. Is there a way to
make elasticsearch to continue indexing?
I am using elasticsearch to query tweets. I am experimenting if it's
available to get some topic assignments from the search engine by doing
this. I am currently using pyelasticsearch 2 Python library and using
basic search function. Do you think of any better search type that might be
useful to search queries like the following?

res = conn.search("Donna Karan To Attend Haiti Charity Fundraiser")

Top 10 results and their scores:

Donna Karan 0.43873835
Haiti/Government 0.1892942
Haiti/People 0.1892942
Haiti/Transportation 0.1892942
Donna Summers 0.18910009
Haiti/History 0.1801775
Haiti/Communications 0.1801775
Haitian music 0.1600614
ISO 3166-1:HT 0.15908843
Haiti/Transnational issues 0.15862915

Does elasticsearch supports query rewriting/pre-processing? If I search
for "Obama" and "Obama's", the results are quite different.

Thanks,
Pinar