the question was about native support in ES for Dublin Core. There are
at least two cases to think about.
If you need to deal just with the syntax of DC element names and you
have a bunch of static documents to index, the simplest way is to read
the content with an XML reader (Sax, STaX API), pick up the local
names of the elements, and write them either directly into the ES Java
API for building content (XContentBuilder) or to JSON files for later
But on the other hand, Dublin Core has also semantics, specified by an
abstract model, see http://dublincore.org/documents/abstract-model/
So if the question was if there is native support in ES for RDF
semantics, which is the underlying model for Dublin Core, you have to
do it for yourself outside ES. It is possible to build JSON syntax
from an RDF model. The way here is to go with a simplified resource/
properties abstraction API that is responsible for realizing your
semantics. Preferable is a light-weight resource/property RDF API to
transform everything you need to manage in a search index into a
nested structure of URIs and literal values. Then, roll out a JSON
structure which is finally picked up be the ES API. This process is
quite similar to generating RDF triples. This should be accompanied by
some hints in the ES mapping (e.g. dc:date should be given a date type
according to W3CDTF, which is recommended best practice).
For jsonizing, I recommended to replace all XML Namespace URIs by a
short prefix, for Dublin Core e.g. "dc:title", "dc:creator",
"dc:identifier" etc. By doing this, it is possible to keep a prefix/
namespace URI Map to manage arbitrary XML in your Dublin Core Model.
This is rather a common use case because the 15 core elements are
quite coarse and will need some refinements in most cases. These
custom refining elements could be nested into XML and so in JSON. And
that is totally fine with ES because ES supports JSON, which is
nested. So, ES can index RDF Dublin Core Models almost naturally. ES
works also smoothly with a colon delimiter in field names, so field
names like "dc:title" are the way to go.
It depends on the task which has to be done: in the case there are a
lot of static documents ready for indexing, it boils down to create a
smart XML parser for jsonizing the data. Probably, even XML namespaces
could be dropped, because they are not needed in case only simple
Dublin Core is present in the data.
In the case if an RDF data model based on Dublin Core is required, the
challenge is to integrate the core ingredients found in the RDF model,
which are resources and properties. There are a lot more, e.g.
ontologies, which are not essential for the rather straight-forward
task of indexing and query RDF literals. If you need validation over
RDF elements, you have much more work to do, which is outside the
scope of search indexes.
Query result processing is trivial by transforming JSON back to XML
(with an extra root wrapper element to ensure a sound XML tree) once
the data is indexed. It's just the other way round. From there, you
can go ahead with XSLT and all the like. If you are willing to drop
JSON at all, most elegant would be a XContentGenerator producing SAX/
StaX events. Optionally, query transformers can be put in front of ES
DSL, for example CQL, using only relevant subsets of the power of the
To simplify the straightening out in the jsonizing of XML, I would
avoid special treatment of attributes in favor of nesting elements. It
helps a bit to get JSON back from XML. Otherwise you have to introduce
contracts, like a first character '@' in ES field name is always
denoting an XML attribute, which might not always be transparent to
all ES search clients.
These are my own experiences based on my implementation since ES
0.5.1. I would be glad to share more thoughts if there are further
On Feb 13, 12:06 am, Michael Sick michael.s...@serenesoftware.com
Jorg - I bet a number of people would find your DC work interesting. +1 for
On Sun, Feb 12, 2012 at 10:50 AM, jprante joergpra...@gmail.com wrote:
I use Dublin Core in Elasticsearch extensively - together with all
kinds of metadata and bibliographic standards such as CQL - and might
to able to give some advise for best practice.
Beside naming the fields with Dublin Core element terms, I tackled the
namespacing challenge and implemented JSON to XML rendering in that
On Feb 12, 2:18 pm, Shay Banon kim...@gmail.com wrote:
Not really, but you can name the fields you index using the dublin core
On Friday, February 10, 2012 at 1:44 PM, ian mayo wrote:
I'd like to find out if/how ElasticSearch supports Dublin Core
metadata, as formalised by the Dublin Core Metadata Initiative (DCMI)
I'm aware that ElasticSearch sits on top of Apache Lucene, but I can't
find any reference to DCMI on the Lucene Documentation.
So, does ElasticSearch have any native support for DCMI?