JSONML and elasticsearch


(posejavier) #1

Hi,

I am developing a project based on CouchDB and elasticsearch.

I have transformed around 500.000 XML documents in JSON using JSONML to store them in the couchdb database.
When I use elasticsearch for these documents, it gives me the error:

org.elasticsearch.index.mapper.MapperParsingException: object mapping
[streams] trying to serialize a value with no field associated with
it, current value [4ecb8c99a2144a03dc000081]
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:
573)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:
443)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:
577)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:
565)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:
435)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:
465)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:
414)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:
285)

Having a look to the code and some of your posts, I realized that this error happens when some field changes from value to object.
For example, in my case I have the following JSON document:

{
"_id": "26700cfb3089832b2d99d3c3ff00d368",
"_rev": "5-46f71a61512d7edca18fe4fc6cccff8f",
"tagName": "change-docut",
"childNodes": [
{
"tagName": "bibliographic-data",
"childNodes": [
{
"data-format": "z76",
"tagName": "publication-ref",
"childNodes": [
{
"tagName": "document-id",
"childNodes": [
{
"tagName": "doc-number",
"childNodes": [
"AR048470"
]
}
],
"lang": "es"
}
]
},
{
"tagName": "classification",
"childNodes": [
{
"tagName": "edition",
"childNodes": [
7
]
}
]
}
]
}
]
}

...the error, as I understand, appears because the object of the second field"childNodes" does not have the field "lang": "es", so elesticsearch gives the error when trying to serialize it because finds that it is null.

The question is...

Q1. is there any way in elasticsearch that I can avoid this error ?
Q2. should I use another XML to JSON converter in order to avoid the error?
Q3. Could it be possible to modify the code in "private void serializeValue" and "private void serializeObject" in order to avoid that elasticsearch gives an error?

MANY THANKS in advance for your help!!!


(system) #2