Converting schema.xml from solr to ES

simonw_2 · July 29, 2012, 11:32am

On Sunday, July 29, 2012 11:51:02 AM UTC+2, Bernd Fehling wrote:

Hi Jörg,

Am Samstag, 28. Juli 2012 16:22:33 UTC+2 schrieb Jörg Prante:

...
An XML river would be an idea! But as XML is just a syntax for "data in a
container format", such a river is mostly useless without the feature of
custom processing extensions for the data (similar to the XML pipeline
processing in FAST). Maybe by scripting XML to JSON? Do you have preference
for a JVM scripting language? Groovy would be a straightforward option,
since I am integrating Groovy scripts into my MAB/MARC converter.

never looked to deep into JSON, just used it somehow.
XML has the advantage that it can be validated before/while loading,
especially if you work with full Unicode via UTF-8.
This also means Unicode above Basic Multilingual Plane.

If you are using Java you can encode non BMP characters since java 1.5. Yet
this has nothing todo with XML or JSON. Json is recommended to be UTF8 and
if you decide so it will be just pass the right CharacterEncoding to your
Json generator. The validation you refer to with XML is implicit in json
for the types. JSON encodes numbers, boolean, binary and character
sequences explicitly and your reading code should validate you json
document. No need for a schema or something like that (while there is such
a thing but I am not sure if its used much).

Is this also covered with JSON?

My idea of a XML river is:

taking XML records from file system

validating

reporting invalid records and dropping from queue

packaging records to batches of size X

sending batches to the index (if possible im parallel if ES supports
this)

Is indexing of ES aware of multithreading?

yes its threadsafe you can just throw documents against it concurrently.

simon

Regards,
Bernd

Topic		Replies	Views
Migration from Solr to ElasticSearch Elasticsearch	6	4641	July 6, 2017
Design question Elasticsearch	6	306	July 6, 2017
Compared to Solr (with Solr Cloud), what is the advantage(s) of Elasticsearch? Elasticsearch	10	812	July 6, 2017
Indexing custom Lucene documents Elasticsearch	6	604	July 6, 2017
Indexing and Searching XML documents Elasticsearch	5	25616	July 6, 2017

Converting schema.xml from solr to ES

Related topics