Hi Bernd,
nice to see you are trying Elasticsearch in Bielefeld!
Since here in Cologne we also moved from FAST to Elasticsearch I hope I can
give some hints.
On Friday, July 27, 2012 8:51:02 PM UTC+2, Bernd Fehling wrote:
Hi Jörg, how is going?
I already figured out how to convert my schema.xml from solr to
elasticsearch. It still needs some handwork.
I've reached 210 GB index size and looking now for splitting the index
which has to be done before reaching 250 GB index size.
The question is now "solr cloud" or elasticsearch?
Elasticsearch is easy on installing and full of nice cloud features, I did
never regret the decision I made early 2010 when Elasticsearch was at 0.5.1.
From what I could read, SolrCloud makes progress, but still needs much
love, in configuration and administration. We all look forward to
Lucene/Solr 4!
Any suggestions?
Biggest problem so far, ES can only load JSON, unbelievable!!!
Just for testing ES i have to write either a XML2JSON River or convert my
test data to JSON.
I don't know, may be i will contact you by phone and we can discuss this
misery.
Would indeed be nice, yes! Maybe that's an opportunity for a
Bielefeld/Cologne search technology meeting again?
XML input was never a real problem here, since I always used an abstraction
layer to process bibliographic data, even back in FAST ESP times (it is
based on a resource/property model, very close to RDF, but without SPARQL,
I'm not using Jena or OpenRDF).
An XML river would be an idea! But as XML is just a syntax for "data in a
container format", such a river is mostly useless without the feature of
custom processing extensions for the data (similar to the XML pipeline
processing in FAST). Maybe by scripting XML to JSON? Do you have preference
for a JVM scripting language? Groovy would be a straightforward option,
since I am integrating Groovy scripts into my MAB/MARC converter.
Best regards,
Jörg