Loading JSON-LD into ES

jprante · September 28, 2014, 6:43am

In your case, you have to create ES bulk format for efficient indexing of
such files, or better, instrumenting the ES python client to push JSON-LD
into ES.

Jörg

On Sun, Sep 28, 2014 at 4:23 AM, abo abo@datavolution.com wrote:

Thank you all for your responses and interesting conversation about RDF
serialization into ES. With regards to my original post, I ended up using
a solution based on RDFlib:

GitHub - RDFLib/rdflib-jsonld: JSON-LD parser and serializer plugins for RDFLib

It works as expected, and compacting the content by using @context does
the trick and is flexible. It is an in-memory process however, which could
be an issue for those with very large RDF files. When using Jena, I didn't
find the ability to add @context mappings, but maybe I didn't dig enough.

On a side note, looks like the rdflib-jsonld solution already has support
for XSD literals and lists, so perhaps it could be extended to map directly
into ES _type if that is a good direction.

With my Json-ld file ready for ingestion into ES, I do have another
question: are there utilities to bulk load such documents (the json-ld
contains individual documents per ES, each with an _id), or do I just write
a script that calls curl -XPUT for each record in the json-ld file? Seems
like a pretty common use case.

Thanks again to all, interesting stuff. Happy to contribute to extending
an existing solution.

-- ab

On Saturday, September 27, 2014 9:24:24 AM UTC-7, Jörg Prante wrote:

For the _mapping, I think about two more types for that I intend to write
ES type mappers, "iri" and "literal", so ES can receive XSD data types and
language codes and map them to fields / analyzers. IRIs are just opaque
strings but they can be shortened if prefix is configured and can be used
as _id or for referencing to an _id.

Instead of _mapping I prefer the thought about handling @contexts like
template documents.

Not sure about the best way to manage JSON-LD. There are two approaches:
save a JSON-LD (you say original document) beside other versions. This
requires more space and I'm not sure about the purpose of the original
JSON-LD. The other approach is more about dropping original JSON-LD after
parsing it to triples and store the triples in an ES JSON doc which is a
surrogate close to JSON-LD but arranges with all the JSON dialect
characteristics of the ES document DSL.

I'm not in scala, so I can not promise much, but happy about glimpsing
all related code!

Jörg

On Sat, Sep 27, 2014 at 4:50 PM, Alfredo Serafini ser...@gmail.com
wrote:

HI Jorg Indeed!

What I like about _mapping is that they are managed as documents too,
and they can be:

automatically inferred from data (at risk, but useful)

provided by static files, in some cases

managed for _index/_types

all those things could be done with something like a _context (which
will include at first a single @context). The first point should probably
be avoided at all for json-ld :-), but it should be possible.

But we may need more @context items for a single "resource" schema
(referring to _index/_type), and in perspective it's even possible to
re-use a @context for different _index/_type pairs.
Furthermore: when exposing results in jsonld one might want to reference
an external @context and merge it before providing results, and In my
opinion the more "risky" part is when input the original json-ld, if we
want to flat it and extract the @context which will permits us to
recostruct later the original document.
Given the fact that it could be possible to map every kind of json
results from ES, documents imported as jsonld might has to maintain at
least the original fields.

I'd like to put some code on github and if you want we could join the
effort on that? I'm working mostly on scala at the moment. What do you
think about?

Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha
scritto:

Absolutely. My thought is about managing one (or more) context ES JSON
document(s) where all the @context definitions of an index live. A format
plugin can then process search results and converts ES JSON to expanded
JSON-LD and from there to other RDF serializations.

Jörg

On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini ser...@gmail.com
wrote:

Hi

using json-ld is indeed rather simple, as it is JSON, and then it's
even possible to index it as is.
I'm currently using ES for storing RDF documents in json-ld on a
specific index: in that case one can simply use the uri as an _id, recover
the full original format by _source, and use basic search capabilities on
the index, if escaping / nesting it's not a big deal.

However, in order to use resource with some more flexibility, I think
the best would be index them as "flat" as possible, then use an ad-hoc
@context on the ES json to obtain again the original json-ld.
This would be my ideal usage at the moment: seems complex at first,
but it's not, I'm currently experimenting in saving @context for a _type,
obtaining let's say a sort of _context, similar to a _mapping, to
reconstruct the original semantics.
If someone likes the idea, I'd like to share thoughts on that

Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha
scritto:

Lukáš,

of course you are right, RDF/XML looks complex and requires parsing.
The underlying principle of all RDF is a graph (or a series of triples in
form of subject/predicate/object, where the triple series is a
serialization of the graph), So the challenge is first the parsing of RDF
input, and second, constructing the model, and third, serializing the model
to an ES-friendly input (here: JSON-LD, sort of). RDF ensures that there is
a single model for all serializations.

This technical perspective does not necessarily solve all challenges
that are inherent to the chosen data model. For example, nested resources
in RDF. It might be feasible to flatten nested resource by their
identifiers and generate one JSON after the other. Or it could be feasible
to keep nested resources intact and wrap them into nested structures in a
single ES JSON object.

In my data model, I can map RDF subject IDs to ES doc IDs. Other data
models may prefer other approaches to select ES doc IDs.

Jörg

On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Jörg,

my concern is that RDF/XML allow to express one thing in several
ways. For example, if you take FOAF specification then there are several
ways how you can express that one Person knows other Person. One way it
using reference IDs other way it using nested Person inside other Person.
See [1] for examples. My understanding is that although both ways express
exactly the same information they lead to different XML representation and
thus to different JSON-LD. Not that you can push such data in ES but I
wonder if you can then have any consistent way of querying such data.

May be there is some way how you can preprocess XML document and
convert all nested Persons to references (would require arbitrary ID
construction?). Or something similar. Though I am not sure this would be
generally applicable approach to any RDF data.

[1] An Introduction to FOAF

Regards,
Lukas

On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com <
joerg...@gmail.com> wrote:

JSON-LD is perfect for ES indexing, as long as you use the
"compact" form of representation.

JSON-LD 1.1 Processing Algorithms and API

Example:

https://github.com/lanthaler/JsonLD/blob/master/Test/Fixture
s/sample-compacted.jsonld

This means you should use short field names and shorten IRIs to a
prefix form. This gives a convenient mapping to ES field names (e.g.
"dc:title" or "dc:creator"). The '@' fields can also be indexed and they do
not control anything special in ES (some @id may be mapped to ES _id but
for nested structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD
but also other formats like N-Triples and RDF/XML) into XContent using this
method:

https://github.com/xbib/xbib/blob/master/content/src/main/ja
va/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an
ES-based Linked Data Platform

Linked Data Platform 1.0

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Hi,

I think you will have to preprocess documents on your side first
and then push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level
serialization od RDF data IMO not optimal for ES indexing. May be better
would be to find some RDF-OOM tool and have your RDF documents mapped to
Java POJOs and serialize POJOs into JSONs instead (you can use Jackson
library for that for example). This will give you better control over whole
RDF -> JSON conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo a...@datavolution.com
wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic
question or if it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that
documents are individually referenced (as opposed to the entire json-ld
file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb
1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%
3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZm
Tcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNt
TAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1a4c20a6-a215-42b8-bf11-350b766b508a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1a4c20a6-a215-42b8-bf11-350b766b508a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEJqXDGX2xNqCi34CQu-q_V2OZHkNx2t5FEBuSLQmXzdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.