Loading JSON-LD into ES


(abo) #1

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic question or if
it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was generated
from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that documents are
individually referenced (as opposed to the entire json-ld file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Lukáš Vlček) #2

Hi,

I think you will have to preprocess documents on your side first and then
push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level serialization od RDF
data IMO not optimal for ES indexing. May be better would be to find some
RDF-OOM tool and have your RDF documents mapped to Java POJOs and serialize
POJOs into JSONs instead (you can use Jackson library for that for
example). This will give you better control over whole RDF -> JSON
conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo abo@datavolution.com wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic question or if
it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was generated
from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that documents are
individually referenced (as opposed to the entire json-ld file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #3

JSON-LD is perfect for ES indexing, as long as you use the "compact" form
of representation.

http://www.w3.org/TR/json-ld-api/#compaction-algorithms

Example:

This means you should use short field names and shorten IRIs to a prefix
form. This gives a convenient mapping to ES field names (e.g. "dc:title" or
"dc:creator"). The '@' fields can also be indexed and they do not control
anything special in ES (some @id may be mapped to ES _id but for nested
structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD but also
other formats like N-Triples and RDF/XML) into XContent using this method:

https://github.com/xbib/xbib/blob/master/content/src/main/java/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an ES-based
Linked Data Platform

http://www.w3.org/TR/ldp/

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

I think you will have to preprocess documents on your side first and then
push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level serialization od
RDF data IMO not optimal for ES indexing. May be better would be to find
some RDF-OOM tool and have your RDF documents mapped to Java POJOs and
serialize POJOs into JSONs instead (you can use Jackson library for that
for example). This will give you better control over whole RDF -> JSON
conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo abo@datavolution.com wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic question or if
it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that documents are
individually referenced (as opposed to the entire json-ld file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Lukáš Vlček) #4

Jörg,

my concern is that RDF/XML allow to express one thing in several ways. For
example, if you take FOAF specification then there are several ways how you
can express that one Person knows other Person. One way it using reference
IDs other way it using nested Person inside other Person. See [1] for
examples. My understanding is that although both ways express exactly the
same information they lead to different XML representation and thus to
different JSON-LD. Not that you can push such data in ES but I wonder if
you can then have any consistent way of querying such data.

May be there is some way how you can preprocess XML document and convert
all nested Persons to references (would require arbitrary ID
construction?). Or something similar. Though I am not sure this would be
generally applicable approach to any RDF data.

[1] http://www.xml.com/pub/a/2004/02/04/foaf.html

Regards,
Lukas

On Fri, Sep 26, 2014 at 9:28 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

JSON-LD is perfect for ES indexing, as long as you use the "compact" form
of representation.

http://www.w3.org/TR/json-ld-api/#compaction-algorithms

Example:

https://github.com/lanthaler/JsonLD/blob/master/Test/Fixtures/sample-compacted.jsonld

This means you should use short field names and shorten IRIs to a prefix
form. This gives a convenient mapping to ES field names (e.g. "dc:title" or
"dc:creator"). The '@' fields can also be indexed and they do not control
anything special in ES (some @id may be mapped to ES _id but for nested
structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD but
also other formats like N-Triples and RDF/XML) into XContent using this
method:

https://github.com/xbib/xbib/blob/master/content/src/main/java/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an ES-based
Linked Data Platform

http://www.w3.org/TR/ldp/

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček lukas.vlcek@gmail.com
wrote:

Hi,

I think you will have to preprocess documents on your side first and then
push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level serialization od
RDF data IMO not optimal for ES indexing. May be better would be to find
some RDF-OOM tool and have your RDF documents mapped to Java POJOs and
serialize POJOs into JSONs instead (you can use Jackson library for that
for example). This will give you better control over whole RDF -> JSON
conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo abo@datavolution.com wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic question or
if it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that documents are
individually referenced (as opposed to the entire json-ld file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #5

Lukáš,

of course you are right, RDF/XML looks complex and requires parsing. The
underlying principle of all RDF is a graph (or a series of triples in form
of subject/predicate/object, where the triple series is a serialization of
the graph), So the challenge is first the parsing of RDF input, and second,
constructing the model, and third, serializing the model to an ES-friendly
input (here: JSON-LD, sort of). RDF ensures that there is a single model
for all serializations.

This technical perspective does not necessarily solve all challenges that
are inherent to the chosen data model. For example, nested resources in
RDF. It might be feasible to flatten nested resource by their identifiers
and generate one JSON after the other. Or it could be feasible to keep
nested resources intact and wrap them into nested structures in a single ES
JSON object.

In my data model, I can map RDF subject IDs to ES doc IDs. Other data
models may prefer other approaches to select ES doc IDs.

Jörg

On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Jörg,

my concern is that RDF/XML allow to express one thing in several ways. For
example, if you take FOAF specification then there are several ways how you
can express that one Person knows other Person. One way it using reference
IDs other way it using nested Person inside other Person. See [1] for
examples. My understanding is that although both ways express exactly the
same information they lead to different XML representation and thus to
different JSON-LD. Not that you can push such data in ES but I wonder if
you can then have any consistent way of querying such data.

May be there is some way how you can preprocess XML document and convert
all nested Persons to references (would require arbitrary ID
construction?). Or something similar. Though I am not sure this would be
generally applicable approach to any RDF data.

[1] http://www.xml.com/pub/a/2004/02/04/foaf.html

Regards,
Lukas

On Fri, Sep 26, 2014 at 9:28 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

JSON-LD is perfect for ES indexing, as long as you use the "compact" form
of representation.

http://www.w3.org/TR/json-ld-api/#compaction-algorithms

Example:

https://github.com/lanthaler/JsonLD/blob/master/Test/Fixtures/sample-compacted.jsonld

This means you should use short field names and shorten IRIs to a prefix
form. This gives a convenient mapping to ES field names (e.g. "dc:title" or
"dc:creator"). The '@' fields can also be indexed and they do not control
anything special in ES (some @id may be mapped to ES _id but for nested
structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD but
also other formats like N-Triples and RDF/XML) into XContent using this
method:

https://github.com/xbib/xbib/blob/master/content/src/main/java/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an ES-based
Linked Data Platform

http://www.w3.org/TR/ldp/

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček lukas.vlcek@gmail.com
wrote:

Hi,

I think you will have to preprocess documents on your side first and
then push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level serialization od
RDF data IMO not optimal for ES indexing. May be better would be to find
some RDF-OOM tool and have your RDF documents mapped to Java POJOs and
serialize POJOs into JSONs instead (you can use Jackson library for that
for example). This will give you better control over whole RDF -> JSON
conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo abo@datavolution.com wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic question or
if it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that documents are
individually referenced (as opposed to the entire json-ld file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGWEaadvoAJWmwDeKqb9pVsYNjS6GAzozVXgYWr4LgXUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Alfredo Serafini) #6

Hi

using json-ld is indeed rather simple, as it is JSON, and then it's even
possible to index it as is.
I'm currently using ES for storing RDF documents in json-ld on a specific
index: in that case one can simply use the uri as an _id, recover the full
original format by _source, and use basic search capabilities on the index,
if escaping / nesting it's not a big deal.

However, in order to use resource with some more flexibility, I think the
best would be index them as "flat" as possible, then use an ad-hoc @context
on the ES json to obtain again the original json-ld.
This would be my ideal usage at the moment: seems complex at first, but
it's not, I'm currently experimenting in saving @context for a _type,
obtaining let's say a sort of _context, similar to a _mapping, to
reconstruct the original semantics.
If someone likes the idea, I'd like to share thoughts on that :slight_smile:

Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha scritto:

Lukáš,

of course you are right, RDF/XML looks complex and requires parsing. The
underlying principle of all RDF is a graph (or a series of triples in form
of subject/predicate/object, where the triple series is a serialization of
the graph), So the challenge is first the parsing of RDF input, and second,
constructing the model, and third, serializing the model to an ES-friendly
input (here: JSON-LD, sort of). RDF ensures that there is a single model
for all serializations.

This technical perspective does not necessarily solve all challenges that
are inherent to the chosen data model. For example, nested resources in
RDF. It might be feasible to flatten nested resource by their identifiers
and generate one JSON after the other. Or it could be feasible to keep
nested resources intact and wrap them into nested structures in a single ES
JSON object.

In my data model, I can map RDF subject IDs to ES doc IDs. Other data
models may prefer other approaches to select ES doc IDs.

Jörg

On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček <lukas...@gmail.com
<javascript:>> wrote:

Jörg,

my concern is that RDF/XML allow to express one thing in several ways.
For example, if you take FOAF specification then there are several ways how
you can express that one Person knows other Person. One way it using
reference IDs other way it using nested Person inside other Person. See [1]
for examples. My understanding is that although both ways express exactly
the same information they lead to different XML representation and thus to
different JSON-LD. Not that you can push such data in ES but I wonder if
you can then have any consistent way of querying such data.

May be there is some way how you can preprocess XML document and convert
all nested Persons to references (would require arbitrary ID
construction?). Or something similar. Though I am not sure this would be
generally applicable approach to any RDF data.

[1] http://www.xml.com/pub/a/2004/02/04/foaf.html

Regards,
Lukas

On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com <javascript:> <
joerg...@gmail.com <javascript:>> wrote:

JSON-LD is perfect for ES indexing, as long as you use the "compact"
form of representation.

http://www.w3.org/TR/json-ld-api/#compaction-algorithms

Example:

https://github.com/lanthaler/JsonLD/blob/master/Test/Fixtures/sample-compacted.jsonld

This means you should use short field names and shorten IRIs to a prefix
form. This gives a convenient mapping to ES field names (e.g. "dc:title" or
"dc:creator"). The '@' fields can also be indexed and they do not control
anything special in ES (some @id may be mapped to ES _id but for nested
structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD but
also other formats like N-Triples and RDF/XML) into XContent using this
method:

https://github.com/xbib/xbib/blob/master/content/src/main/java/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an ES-based
Linked Data Platform

http://www.w3.org/TR/ldp/

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček <lukas...@gmail.com
<javascript:>> wrote:

Hi,

I think you will have to preprocess documents on your side first and
then push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level serialization od
RDF data IMO not optimal for ES indexing. May be better would be to find
some RDF-OOM tool and have your RDF documents mapped to Java POJOs and
serialize POJOs into JSONs instead (you can use Jackson library for that
for example). This will give you better control over whole RDF -> JSON
conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo <a...@datavolution.com
<javascript:>> wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic question or
if it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that documents
are individually referenced (as opposed to the entire json-ld file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #7

Absolutely. My thought is about managing one (or more) context ES JSON
document(s) where all the @context definitions of an index live. A format
plugin can then process search results and converts ES JSON to expanded
JSON-LD and from there to other RDF serializations.

Jörg

On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini seralf@gmail.com wrote:

Hi

using json-ld is indeed rather simple, as it is JSON, and then it's even
possible to index it as is.
I'm currently using ES for storing RDF documents in json-ld on a specific
index: in that case one can simply use the uri as an _id, recover the full
original format by _source, and use basic search capabilities on the index,
if escaping / nesting it's not a big deal.

However, in order to use resource with some more flexibility, I think the
best would be index them as "flat" as possible, then use an ad-hoc @context
on the ES json to obtain again the original json-ld.
This would be my ideal usage at the moment: seems complex at first, but
it's not, I'm currently experimenting in saving @context for a _type,
obtaining let's say a sort of _context, similar to a _mapping, to
reconstruct the original semantics.
If someone likes the idea, I'd like to share thoughts on that :slight_smile:

Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha scritto:

Lukáš,

of course you are right, RDF/XML looks complex and requires parsing. The
underlying principle of all RDF is a graph (or a series of triples in form
of subject/predicate/object, where the triple series is a serialization of
the graph), So the challenge is first the parsing of RDF input, and second,
constructing the model, and third, serializing the model to an ES-friendly
input (here: JSON-LD, sort of). RDF ensures that there is a single model
for all serializations.

This technical perspective does not necessarily solve all challenges that
are inherent to the chosen data model. For example, nested resources in
RDF. It might be feasible to flatten nested resource by their identifiers
and generate one JSON after the other. Or it could be feasible to keep
nested resources intact and wrap them into nested structures in a single ES
JSON object.

In my data model, I can map RDF subject IDs to ES doc IDs. Other data
models may prefer other approaches to select ES doc IDs.

Jörg

On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček lukas...@gmail.com wrote:

Jörg,

my concern is that RDF/XML allow to express one thing in several ways.
For example, if you take FOAF specification then there are several ways how
you can express that one Person knows other Person. One way it using
reference IDs other way it using nested Person inside other Person. See [1]
for examples. My understanding is that although both ways express exactly
the same information they lead to different XML representation and thus to
different JSON-LD. Not that you can push such data in ES but I wonder if
you can then have any consistent way of querying such data.

May be there is some way how you can preprocess XML document and convert
all nested Persons to references (would require arbitrary ID
construction?). Or something similar. Though I am not sure this would be
generally applicable approach to any RDF data.

[1] http://www.xml.com/pub/a/2004/02/04/foaf.html

Regards,
Lukas

On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com joerg...@gmail.com
wrote:

JSON-LD is perfect for ES indexing, as long as you use the "compact"
form of representation.

http://www.w3.org/TR/json-ld-api/#compaction-algorithms

Example:

https://github.com/lanthaler/JsonLD/blob/master/Test/
Fixtures/sample-compacted.jsonld

This means you should use short field names and shorten IRIs to a
prefix form. This gives a convenient mapping to ES field names (e.g.
"dc:title" or "dc:creator"). The '@' fields can also be indexed and they do
not control anything special in ES (some @id may be mapped to ES _id but
for nested structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD but
also other formats like N-Triples and RDF/XML) into XContent using this
method:

https://github.com/xbib/xbib/blob/master/content/src/main/
java/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an ES-based
Linked Data Platform

http://www.w3.org/TR/ldp/

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Hi,

I think you will have to preprocess documents on your side first and
then push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level serialization
od RDF data IMO not optimal for ES indexing. May be better would be to find
some RDF-OOM tool and have your RDF documents mapped to Java POJOs and
serialize POJOs into JSONs instead (you can use Jackson library for that
for example). This will give you better control over whole RDF -> JSON
conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo a...@datavolution.com wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic question
or if it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that documents
are individually referenced (as opposed to the entire json-ld file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH
4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%
3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-
L1rKqChja45X7EGTEyc2bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFNKDY-uae0P2SRtDehzeEBL4DhXB7uytZrehkXdmjszQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Alfredo Serafini) #8

HI Jorg Indeed! :slight_smile:

What I like about _mapping is that they are managed as documents too, and
they can be:

  1. automatically inferred from data (at risk, but useful)
  2. provided by static files, in some cases
  3. managed for _index/_types

all those things could be done with something like a _context (which will
include at first a single @context). The first point should probably be
avoided at all for json-ld :-), but it should be possible.

But we may need more @context items for a single "resource" schema
(referring to _index/_type), and in perspective it's even possible to
re-use a @context for different _index/_type pairs.
Furthermore: when exposing results in jsonld one might want to reference an
external @context and merge it before providing results, and In my opinion
the more "risky" part is when input the original json-ld, if we want to
flat it and extract the @context which will permits us to recostruct later
the original document.
Given the fact that it could be possible to map every kind of json results
from ES, documents imported as jsonld might has to maintain at least the
original fields.

I'd like to put some code on github and if you want we could join the
effort on that? I'm working mostly on scala at the moment. What do you
think about?

Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha scritto:

Absolutely. My thought is about managing one (or more) context ES JSON
document(s) where all the @context definitions of an index live. A format
plugin can then process search results and converts ES JSON to expanded
JSON-LD and from there to other RDF serializations.

Jörg

On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini <ser...@gmail.com
<javascript:>> wrote:

Hi

using json-ld is indeed rather simple, as it is JSON, and then it's even
possible to index it as is.
I'm currently using ES for storing RDF documents in json-ld on a specific
index: in that case one can simply use the uri as an _id, recover the full
original format by _source, and use basic search capabilities on the index,
if escaping / nesting it's not a big deal.

However, in order to use resource with some more flexibility, I think the
best would be index them as "flat" as possible, then use an ad-hoc @context
on the ES json to obtain again the original json-ld.
This would be my ideal usage at the moment: seems complex at first, but
it's not, I'm currently experimenting in saving @context for a _type,
obtaining let's say a sort of _context, similar to a _mapping, to
reconstruct the original semantics.
If someone likes the idea, I'd like to share thoughts on that :slight_smile:

Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha
scritto:

Lukáš,

of course you are right, RDF/XML looks complex and requires parsing. The
underlying principle of all RDF is a graph (or a series of triples in form
of subject/predicate/object, where the triple series is a serialization of
the graph), So the challenge is first the parsing of RDF input, and second,
constructing the model, and third, serializing the model to an ES-friendly
input (here: JSON-LD, sort of). RDF ensures that there is a single model
for all serializations.

This technical perspective does not necessarily solve all challenges
that are inherent to the chosen data model. For example, nested resources
in RDF. It might be feasible to flatten nested resource by their
identifiers and generate one JSON after the other. Or it could be feasible
to keep nested resources intact and wrap them into nested structures in a
single ES JSON object.

In my data model, I can map RDF subject IDs to ES doc IDs. Other data
models may prefer other approaches to select ES doc IDs.

Jörg

On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Jörg,

my concern is that RDF/XML allow to express one thing in several ways.
For example, if you take FOAF specification then there are several ways how
you can express that one Person knows other Person. One way it using
reference IDs other way it using nested Person inside other Person. See [1]
for examples. My understanding is that although both ways express exactly
the same information they lead to different XML representation and thus to
different JSON-LD. Not that you can push such data in ES but I wonder if
you can then have any consistent way of querying such data.

May be there is some way how you can preprocess XML document and
convert all nested Persons to references (would require arbitrary ID
construction?). Or something similar. Though I am not sure this would be
generally applicable approach to any RDF data.

[1] http://www.xml.com/pub/a/2004/02/04/foaf.html

Regards,
Lukas

On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com <joerg...@gmail.com

wrote:

JSON-LD is perfect for ES indexing, as long as you use the "compact"
form of representation.

http://www.w3.org/TR/json-ld-api/#compaction-algorithms

Example:

https://github.com/lanthaler/JsonLD/blob/master/Test/
Fixtures/sample-compacted.jsonld

This means you should use short field names and shorten IRIs to a
prefix form. This gives a convenient mapping to ES field names (e.g.
"dc:title" or "dc:creator"). The '@' fields can also be indexed and they do
not control anything special in ES (some @id may be mapped to ES _id but
for nested structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD but
also other formats like N-Triples and RDF/XML) into XContent using this
method:

https://github.com/xbib/xbib/blob/master/content/src/main/
java/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an ES-based
Linked Data Platform

http://www.w3.org/TR/ldp/

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Hi,

I think you will have to preprocess documents on your side first and
then push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level serialization
od RDF data IMO not optimal for ES indexing. May be better would be to find
some RDF-OOM tool and have your RDF documents mapped to Java POJOs and
serialize POJOs into JSONs instead (you can use Jackson library for that
for example). This will give you better control over whole RDF -> JSON
conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo a...@datavolution.com wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic question
or if it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that documents
are individually referenced (as opposed to the entire json-ld file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-
5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH
4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%
3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-
L1rKqChja45X7EGTEyc2bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #9

For the _mapping, I think about two more types for that I intend to write
ES type mappers, "iri" and "literal", so ES can receive XSD data types and
language codes and map them to fields / analyzers. IRIs are just opaque
strings but they can be shortened if prefix is configured and can be used
as _id or for referencing to an _id.

Instead of _mapping I prefer the thought about handling @contexts like
template documents.

Not sure about the best way to manage JSON-LD. There are two approaches:
save a JSON-LD (you say original document) beside other versions. This
requires more space and I'm not sure about the purpose of the original
JSON-LD. The other approach is more about dropping original JSON-LD after
parsing it to triples and store the triples in an ES JSON doc which is a
surrogate close to JSON-LD but arranges with all the JSON dialect
characteristics of the ES document DSL.

I'm not in scala, so I can not promise much, but happy about glimpsing all
related code!

Jörg

On Sat, Sep 27, 2014 at 4:50 PM, Alfredo Serafini seralf@gmail.com wrote:

HI Jorg Indeed! :slight_smile:

What I like about _mapping is that they are managed as documents too, and
they can be:

  1. automatically inferred from data (at risk, but useful)
  2. provided by static files, in some cases
  3. managed for _index/_types

all those things could be done with something like a _context (which will
include at first a single @context). The first point should probably be
avoided at all for json-ld :-), but it should be possible.

But we may need more @context items for a single "resource" schema
(referring to _index/_type), and in perspective it's even possible to
re-use a @context for different _index/_type pairs.
Furthermore: when exposing results in jsonld one might want to reference
an external @context and merge it before providing results, and In my
opinion the more "risky" part is when input the original json-ld, if we
want to flat it and extract the @context which will permits us to
recostruct later the original document.
Given the fact that it could be possible to map every kind of json results
from ES, documents imported as jsonld might has to maintain at least the
original fields.

I'd like to put some code on github and if you want we could join the
effort on that? I'm working mostly on scala at the moment. What do you
think about?

Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha scritto:

Absolutely. My thought is about managing one (or more) context ES JSON
document(s) where all the @context definitions of an index live. A format
plugin can then process search results and converts ES JSON to expanded
JSON-LD and from there to other RDF serializations.

Jörg

On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini ser...@gmail.com
wrote:

Hi

using json-ld is indeed rather simple, as it is JSON, and then it's even
possible to index it as is.
I'm currently using ES for storing RDF documents in json-ld on a
specific index: in that case one can simply use the uri as an _id, recover
the full original format by _source, and use basic search capabilities on
the index, if escaping / nesting it's not a big deal.

However, in order to use resource with some more flexibility, I think
the best would be index them as "flat" as possible, then use an ad-hoc
@context on the ES json to obtain again the original json-ld.
This would be my ideal usage at the moment: seems complex at first, but
it's not, I'm currently experimenting in saving @context for a _type,
obtaining let's say a sort of _context, similar to a _mapping, to
reconstruct the original semantics.
If someone likes the idea, I'd like to share thoughts on that :slight_smile:

Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha
scritto:

Lukáš,

of course you are right, RDF/XML looks complex and requires parsing.
The underlying principle of all RDF is a graph (or a series of triples in
form of subject/predicate/object, where the triple series is a
serialization of the graph), So the challenge is first the parsing of RDF
input, and second, constructing the model, and third, serializing the model
to an ES-friendly input (here: JSON-LD, sort of). RDF ensures that there is
a single model for all serializations.

This technical perspective does not necessarily solve all challenges
that are inherent to the chosen data model. For example, nested resources
in RDF. It might be feasible to flatten nested resource by their
identifiers and generate one JSON after the other. Or it could be feasible
to keep nested resources intact and wrap them into nested structures in a
single ES JSON object.

In my data model, I can map RDF subject IDs to ES doc IDs. Other data
models may prefer other approaches to select ES doc IDs.

Jörg

On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Jörg,

my concern is that RDF/XML allow to express one thing in several ways.
For example, if you take FOAF specification then there are several ways how
you can express that one Person knows other Person. One way it using
reference IDs other way it using nested Person inside other Person. See [1]
for examples. My understanding is that although both ways express exactly
the same information they lead to different XML representation and thus to
different JSON-LD. Not that you can push such data in ES but I wonder if
you can then have any consistent way of querying such data.

May be there is some way how you can preprocess XML document and
convert all nested Persons to references (would require arbitrary ID
construction?). Or something similar. Though I am not sure this would be
generally applicable approach to any RDF data.

[1] http://www.xml.com/pub/a/2004/02/04/foaf.html

Regards,
Lukas

On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com <
joerg...@gmail.com> wrote:

JSON-LD is perfect for ES indexing, as long as you use the "compact"
form of representation.

http://www.w3.org/TR/json-ld-api/#compaction-algorithms

Example:

https://github.com/lanthaler/JsonLD/blob/master/Test/Fixture
s/sample-compacted.jsonld

This means you should use short field names and shorten IRIs to a
prefix form. This gives a convenient mapping to ES field names (e.g.
"dc:title" or "dc:creator"). The '@' fields can also be indexed and they do
not control anything special in ES (some @id may be mapped to ES _id but
for nested structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD
but also other formats like N-Triples and RDF/XML) into XContent using this
method:

https://github.com/xbib/xbib/blob/master/content/src/main/ja
va/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an ES-based
Linked Data Platform

http://www.w3.org/TR/ldp/

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Hi,

I think you will have to preprocess documents on your side first and
then push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level serialization
od RDF data IMO not optimal for ES indexing. May be better would be to find
some RDF-OOM tool and have your RDF documents mapped to Java POJOs and
serialize POJOs into JSONs instead (you can use Jackson library for that
for example). This will give you better control over whole RDF -> JSON
conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo a...@datavolution.com wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic question
or if it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that documents
are individually referenced (as opposed to the entire json-ld file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb
1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%
3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%
3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45
X7EGTEyc2bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF7aoPkgdQF_id%3DA7KDBMffQaWtFBtnpvuBtjmJZpLqXQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Amine Bouayad) #10

Thank you all for your responses and interesting conversation about RDF
serialization into ES. With regards to my original post, I ended up using a
solution based on RDFlib:

It works as expected, and compacting the content by using @context does the
trick and is flexible. It is an in-memory process however, which could be
an issue for those with very large RDF files. When using Jena, I didn't
find the ability to add @context mappings, but maybe I didn't dig enough.

On a side note, looks like the rdflib-jsonld solution already has support
for XSD literals and lists, so perhaps it could be extended to map directly
into ES _type if that is a good direction.

With my Json-ld file ready for ingestion into ES, I do have another
question: are there utilities to bulk load such documents (the json-ld
contains individual documents per ES, each with an _id), or do I just write
a script that calls curl -XPUT for each record in the json-ld file? Seems
like a pretty common use case.

Thanks again to all, interesting stuff. Happy to contribute to extending an
existing solution.

Amine

On Saturday, September 27, 2014 9:24:24 AM UTC-7, Jörg Prante wrote:

For the _mapping, I think about two more types for that I intend to write
ES type mappers, "iri" and "literal", so ES can receive XSD data types and
language codes and map them to fields / analyzers. IRIs are just opaque
strings but they can be shortened if prefix is configured and can be used
as _id or for referencing to an _id.

Instead of _mapping I prefer the thought about handling @contexts like
template documents.

Not sure about the best way to manage JSON-LD. There are two approaches:
save a JSON-LD (you say original document) beside other versions. This
requires more space and I'm not sure about the purpose of the original
JSON-LD. The other approach is more about dropping original JSON-LD after
parsing it to triples and store the triples in an ES JSON doc which is a
surrogate close to JSON-LD but arranges with all the JSON dialect
characteristics of the ES document DSL.

I'm not in scala, so I can not promise much, but happy about glimpsing all
related code!

Jörg

On Sat, Sep 27, 2014 at 4:50 PM, Alfredo Serafini <ser...@gmail.com
<javascript:>> wrote:

HI Jorg Indeed! :slight_smile:

What I like about _mapping is that they are managed as documents too, and
they can be:

  1. automatically inferred from data (at risk, but useful)
  2. provided by static files, in some cases
  3. managed for _index/_types

all those things could be done with something like a _context (which will
include at first a single @context). The first point should probably be
avoided at all for json-ld :-), but it should be possible.

But we may need more @context items for a single "resource" schema
(referring to _index/_type), and in perspective it's even possible to
re-use a @context for different _index/_type pairs.
Furthermore: when exposing results in jsonld one might want to reference
an external @context and merge it before providing results, and In my
opinion the more "risky" part is when input the original json-ld, if we
want to flat it and extract the @context which will permits us to
recostruct later the original document.
Given the fact that it could be possible to map every kind of json
results from ES, documents imported as jsonld might has to maintain at
least the original fields.

I'd like to put some code on github and if you want we could join the
effort on that? I'm working mostly on scala at the moment. What do you
think about?

Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha
scritto:

Absolutely. My thought is about managing one (or more) context ES JSON
document(s) where all the @context definitions of an index live. A format
plugin can then process search results and converts ES JSON to expanded
JSON-LD and from there to other RDF serializations.

Jörg

On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini ser...@gmail.com
wrote:

Hi

using json-ld is indeed rather simple, as it is JSON, and then it's
even possible to index it as is.
I'm currently using ES for storing RDF documents in json-ld on a
specific index: in that case one can simply use the uri as an _id, recover
the full original format by _source, and use basic search capabilities on
the index, if escaping / nesting it's not a big deal.

However, in order to use resource with some more flexibility, I think
the best would be index them as "flat" as possible, then use an ad-hoc
@context on the ES json to obtain again the original json-ld.
This would be my ideal usage at the moment: seems complex at first, but
it's not, I'm currently experimenting in saving @context for a _type,
obtaining let's say a sort of _context, similar to a _mapping, to
reconstruct the original semantics.
If someone likes the idea, I'd like to share thoughts on that :slight_smile:

Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha
scritto:

Lukáš,

of course you are right, RDF/XML looks complex and requires parsing.
The underlying principle of all RDF is a graph (or a series of triples in
form of subject/predicate/object, where the triple series is a
serialization of the graph), So the challenge is first the parsing of RDF
input, and second, constructing the model, and third, serializing the model
to an ES-friendly input (here: JSON-LD, sort of). RDF ensures that there is
a single model for all serializations.

This technical perspective does not necessarily solve all challenges
that are inherent to the chosen data model. For example, nested resources
in RDF. It might be feasible to flatten nested resource by their
identifiers and generate one JSON after the other. Or it could be feasible
to keep nested resources intact and wrap them into nested structures in a
single ES JSON object.

In my data model, I can map RDF subject IDs to ES doc IDs. Other data
models may prefer other approaches to select ES doc IDs.

Jörg

On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Jörg,

my concern is that RDF/XML allow to express one thing in several
ways. For example, if you take FOAF specification then there are several
ways how you can express that one Person knows other Person. One way it
using reference IDs other way it using nested Person inside other Person.
See [1] for examples. My understanding is that although both ways express
exactly the same information they lead to different XML representation and
thus to different JSON-LD. Not that you can push such data in ES but I
wonder if you can then have any consistent way of querying such data.

May be there is some way how you can preprocess XML document and
convert all nested Persons to references (would require arbitrary ID
construction?). Or something similar. Though I am not sure this would be
generally applicable approach to any RDF data.

[1] http://www.xml.com/pub/a/2004/02/04/foaf.html

Regards,
Lukas

On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com <
joerg...@gmail.com> wrote:

JSON-LD is perfect for ES indexing, as long as you use the "compact"
form of representation.

http://www.w3.org/TR/json-ld-api/#compaction-algorithms

Example:

https://github.com/lanthaler/JsonLD/blob/master/Test/Fixture
s/sample-compacted.jsonld

This means you should use short field names and shorten IRIs to a
prefix form. This gives a convenient mapping to ES field names (e.g.
"dc:title" or "dc:creator"). The '@' fields can also be indexed and they do
not control anything special in ES (some @id may be mapped to ES _id but
for nested structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD
but also other formats like N-Triples and RDF/XML) into XContent using this
method:

https://github.com/xbib/xbib/blob/master/content/src/main/ja
va/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an
ES-based Linked Data Platform

http://www.w3.org/TR/ldp/

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Hi,

I think you will have to preprocess documents on your side first
and then push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level
serialization od RDF data IMO not optimal for ES indexing. May be better
would be to find some RDF-OOM tool and have your RDF documents mapped to
Java POJOs and serialize POJOs into JSONs instead (you can use Jackson
library for that for example). This will give you better control over whole
RDF -> JSON conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo a...@datavolution.com wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic
question or if it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that
documents are individually referenced (as opposed to the entire json-ld
file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb
1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%
3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZm
Tcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45
X7EGTEyc2bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3a8bf05b-ab48-43b8-8863-0a0ede739a32%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(abo) #11

Thank you all for your responses and interesting conversation about RDF
serialization into ES. With regards to my original post, I ended up using a
solution based on RDFlib:

It works as expected, and compacting the content by using @context does the
trick and is flexible. It is an in-memory process however, which could be
an issue for those with very large RDF files. When using Jena, I didn't
find the ability to add @context mappings, but maybe I didn't dig enough.

On a side note, looks like the rdflib-jsonld solution already has support
for XSD literals and lists, so perhaps it could be extended to map directly
into ES _type if that is a good direction.

With my Json-ld file ready for ingestion into ES, I do have another
question: are there utilities to bulk load such documents (the json-ld
contains individual documents per ES, each with an _id), or do I just write
a script that calls curl -XPUT for each record in the json-ld file? Seems
like a pretty common use case.

Thanks again to all, interesting stuff. Happy to contribute to extending an
existing solution.

-- ab

On Saturday, September 27, 2014 9:24:24 AM UTC-7, Jörg Prante wrote:

For the _mapping, I think about two more types for that I intend to write
ES type mappers, "iri" and "literal", so ES can receive XSD data types and
language codes and map them to fields / analyzers. IRIs are just opaque
strings but they can be shortened if prefix is configured and can be used
as _id or for referencing to an _id.

Instead of _mapping I prefer the thought about handling @contexts like
template documents.

Not sure about the best way to manage JSON-LD. There are two approaches:
save a JSON-LD (you say original document) beside other versions. This
requires more space and I'm not sure about the purpose of the original
JSON-LD. The other approach is more about dropping original JSON-LD after
parsing it to triples and store the triples in an ES JSON doc which is a
surrogate close to JSON-LD but arranges with all the JSON dialect
characteristics of the ES document DSL.

I'm not in scala, so I can not promise much, but happy about glimpsing all
related code!

Jörg

On Sat, Sep 27, 2014 at 4:50 PM, Alfredo Serafini <ser...@gmail.com
<javascript:>> wrote:

HI Jorg Indeed! :slight_smile:

What I like about _mapping is that they are managed as documents too, and
they can be:

  1. automatically inferred from data (at risk, but useful)
  2. provided by static files, in some cases
  3. managed for _index/_types

all those things could be done with something like a _context (which will
include at first a single @context). The first point should probably be
avoided at all for json-ld :-), but it should be possible.

But we may need more @context items for a single "resource" schema
(referring to _index/_type), and in perspective it's even possible to
re-use a @context for different _index/_type pairs.
Furthermore: when exposing results in jsonld one might want to reference
an external @context and merge it before providing results, and In my
opinion the more "risky" part is when input the original json-ld, if we
want to flat it and extract the @context which will permits us to
recostruct later the original document.
Given the fact that it could be possible to map every kind of json
results from ES, documents imported as jsonld might has to maintain at
least the original fields.

I'd like to put some code on github and if you want we could join the
effort on that? I'm working mostly on scala at the moment. What do you
think about?

Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha
scritto:

Absolutely. My thought is about managing one (or more) context ES JSON
document(s) where all the @context definitions of an index live. A format
plugin can then process search results and converts ES JSON to expanded
JSON-LD and from there to other RDF serializations.

Jörg

On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini ser...@gmail.com
wrote:

Hi

using json-ld is indeed rather simple, as it is JSON, and then it's
even possible to index it as is.
I'm currently using ES for storing RDF documents in json-ld on a
specific index: in that case one can simply use the uri as an _id, recover
the full original format by _source, and use basic search capabilities on
the index, if escaping / nesting it's not a big deal.

However, in order to use resource with some more flexibility, I think
the best would be index them as "flat" as possible, then use an ad-hoc
@context on the ES json to obtain again the original json-ld.
This would be my ideal usage at the moment: seems complex at first, but
it's not, I'm currently experimenting in saving @context for a _type,
obtaining let's say a sort of _context, similar to a _mapping, to
reconstruct the original semantics.
If someone likes the idea, I'd like to share thoughts on that :slight_smile:

Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha
scritto:

Lukáš,

of course you are right, RDF/XML looks complex and requires parsing.
The underlying principle of all RDF is a graph (or a series of triples in
form of subject/predicate/object, where the triple series is a
serialization of the graph), So the challenge is first the parsing of RDF
input, and second, constructing the model, and third, serializing the model
to an ES-friendly input (here: JSON-LD, sort of). RDF ensures that there is
a single model for all serializations.

This technical perspective does not necessarily solve all challenges
that are inherent to the chosen data model. For example, nested resources
in RDF. It might be feasible to flatten nested resource by their
identifiers and generate one JSON after the other. Or it could be feasible
to keep nested resources intact and wrap them into nested structures in a
single ES JSON object.

In my data model, I can map RDF subject IDs to ES doc IDs. Other data
models may prefer other approaches to select ES doc IDs.

Jörg

On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Jörg,

my concern is that RDF/XML allow to express one thing in several
ways. For example, if you take FOAF specification then there are several
ways how you can express that one Person knows other Person. One way it
using reference IDs other way it using nested Person inside other Person.
See [1] for examples. My understanding is that although both ways express
exactly the same information they lead to different XML representation and
thus to different JSON-LD. Not that you can push such data in ES but I
wonder if you can then have any consistent way of querying such data.

May be there is some way how you can preprocess XML document and
convert all nested Persons to references (would require arbitrary ID
construction?). Or something similar. Though I am not sure this would be
generally applicable approach to any RDF data.

[1] http://www.xml.com/pub/a/2004/02/04/foaf.html

Regards,
Lukas

On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com <
joerg...@gmail.com> wrote:

JSON-LD is perfect for ES indexing, as long as you use the "compact"
form of representation.

http://www.w3.org/TR/json-ld-api/#compaction-algorithms

Example:

https://github.com/lanthaler/JsonLD/blob/master/Test/Fixture
s/sample-compacted.jsonld

This means you should use short field names and shorten IRIs to a
prefix form. This gives a convenient mapping to ES field names (e.g.
"dc:title" or "dc:creator"). The '@' fields can also be indexed and they do
not control anything special in ES (some @id may be mapped to ES _id but
for nested structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD
but also other formats like N-Triples and RDF/XML) into XContent using this
method:

https://github.com/xbib/xbib/blob/master/content/src/main/ja
va/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an
ES-based Linked Data Platform

http://www.w3.org/TR/ldp/

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Hi,

I think you will have to preprocess documents on your side first
and then push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level
serialization od RDF data IMO not optimal for ES indexing. May be better
would be to find some RDF-OOM tool and have your RDF documents mapped to
Java POJOs and serialize POJOs into JSONs instead (you can use Jackson
library for that for example). This will give you better control over whole
RDF -> JSON conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo a...@datavolution.com wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic
question or if it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that
documents are individually referenced (as opposed to the entire json-ld
file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb
1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%
3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZm
Tcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45
X7EGTEyc2bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1a4c20a6-a215-42b8-bf11-350b766b508a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #12

In your case, you have to create ES bulk format for efficient indexing of
such files, or better, instrumenting the ES python client to push JSON-LD
into ES.

Jörg

On Sun, Sep 28, 2014 at 4:23 AM, abo abo@datavolution.com wrote:

Thank you all for your responses and interesting conversation about RDF
serialization into ES. With regards to my original post, I ended up using
a solution based on RDFlib:

https://github.com/RDFLib/rdflib-jsonld

It works as expected, and compacting the content by using @context does
the trick and is flexible. It is an in-memory process however, which could
be an issue for those with very large RDF files. When using Jena, I didn't
find the ability to add @context mappings, but maybe I didn't dig enough.

On a side note, looks like the rdflib-jsonld solution already has support
for XSD literals and lists, so perhaps it could be extended to map directly
into ES _type if that is a good direction.

With my Json-ld file ready for ingestion into ES, I do have another
question: are there utilities to bulk load such documents (the json-ld
contains individual documents per ES, each with an _id), or do I just write
a script that calls curl -XPUT for each record in the json-ld file? Seems
like a pretty common use case.

Thanks again to all, interesting stuff. Happy to contribute to extending
an existing solution.

-- ab

On Saturday, September 27, 2014 9:24:24 AM UTC-7, Jörg Prante wrote:

For the _mapping, I think about two more types for that I intend to write
ES type mappers, "iri" and "literal", so ES can receive XSD data types and
language codes and map them to fields / analyzers. IRIs are just opaque
strings but they can be shortened if prefix is configured and can be used
as _id or for referencing to an _id.

Instead of _mapping I prefer the thought about handling @contexts like
template documents.

Not sure about the best way to manage JSON-LD. There are two approaches:
save a JSON-LD (you say original document) beside other versions. This
requires more space and I'm not sure about the purpose of the original
JSON-LD. The other approach is more about dropping original JSON-LD after
parsing it to triples and store the triples in an ES JSON doc which is a
surrogate close to JSON-LD but arranges with all the JSON dialect
characteristics of the ES document DSL.

I'm not in scala, so I can not promise much, but happy about glimpsing
all related code!

Jörg

On Sat, Sep 27, 2014 at 4:50 PM, Alfredo Serafini ser...@gmail.com
wrote:

HI Jorg Indeed! :slight_smile:

What I like about _mapping is that they are managed as documents too,
and they can be:

  1. automatically inferred from data (at risk, but useful)
  2. provided by static files, in some cases
  3. managed for _index/_types

all those things could be done with something like a _context (which
will include at first a single @context). The first point should probably
be avoided at all for json-ld :-), but it should be possible.

But we may need more @context items for a single "resource" schema
(referring to _index/_type), and in perspective it's even possible to
re-use a @context for different _index/_type pairs.
Furthermore: when exposing results in jsonld one might want to reference
an external @context and merge it before providing results, and In my
opinion the more "risky" part is when input the original json-ld, if we
want to flat it and extract the @context which will permits us to
recostruct later the original document.
Given the fact that it could be possible to map every kind of json
results from ES, documents imported as jsonld might has to maintain at
least the original fields.

I'd like to put some code on github and if you want we could join the
effort on that? I'm working mostly on scala at the moment. What do you
think about?

Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha
scritto:

Absolutely. My thought is about managing one (or more) context ES JSON
document(s) where all the @context definitions of an index live. A format
plugin can then process search results and converts ES JSON to expanded
JSON-LD and from there to other RDF serializations.

Jörg

On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini ser...@gmail.com
wrote:

Hi

using json-ld is indeed rather simple, as it is JSON, and then it's
even possible to index it as is.
I'm currently using ES for storing RDF documents in json-ld on a
specific index: in that case one can simply use the uri as an _id, recover
the full original format by _source, and use basic search capabilities on
the index, if escaping / nesting it's not a big deal.

However, in order to use resource with some more flexibility, I think
the best would be index them as "flat" as possible, then use an ad-hoc
@context on the ES json to obtain again the original json-ld.
This would be my ideal usage at the moment: seems complex at first,
but it's not, I'm currently experimenting in saving @context for a _type,
obtaining let's say a sort of _context, similar to a _mapping, to
reconstruct the original semantics.
If someone likes the idea, I'd like to share thoughts on that :slight_smile:

Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha
scritto:

Lukáš,

of course you are right, RDF/XML looks complex and requires parsing.
The underlying principle of all RDF is a graph (or a series of triples in
form of subject/predicate/object, where the triple series is a
serialization of the graph), So the challenge is first the parsing of RDF
input, and second, constructing the model, and third, serializing the model
to an ES-friendly input (here: JSON-LD, sort of). RDF ensures that there is
a single model for all serializations.

This technical perspective does not necessarily solve all challenges
that are inherent to the chosen data model. For example, nested resources
in RDF. It might be feasible to flatten nested resource by their
identifiers and generate one JSON after the other. Or it could be feasible
to keep nested resources intact and wrap them into nested structures in a
single ES JSON object.

In my data model, I can map RDF subject IDs to ES doc IDs. Other data
models may prefer other approaches to select ES doc IDs.

Jörg

On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Jörg,

my concern is that RDF/XML allow to express one thing in several
ways. For example, if you take FOAF specification then there are several
ways how you can express that one Person knows other Person. One way it
using reference IDs other way it using nested Person inside other Person.
See [1] for examples. My understanding is that although both ways express
exactly the same information they lead to different XML representation and
thus to different JSON-LD. Not that you can push such data in ES but I
wonder if you can then have any consistent way of querying such data.

May be there is some way how you can preprocess XML document and
convert all nested Persons to references (would require arbitrary ID
construction?). Or something similar. Though I am not sure this would be
generally applicable approach to any RDF data.

[1] http://www.xml.com/pub/a/2004/02/04/foaf.html

Regards,
Lukas

On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com <
joerg...@gmail.com> wrote:

JSON-LD is perfect for ES indexing, as long as you use the
"compact" form of representation.

http://www.w3.org/TR/json-ld-api/#compaction-algorithms

Example:

https://github.com/lanthaler/JsonLD/blob/master/Test/Fixture
s/sample-compacted.jsonld

This means you should use short field names and shorten IRIs to a
prefix form. This gives a convenient mapping to ES field names (e.g.
"dc:title" or "dc:creator"). The '@' fields can also be indexed and they do
not control anything special in ES (some @id may be mapped to ES _id but
for nested structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD
but also other formats like N-Triples and RDF/XML) into XContent using this
method:

https://github.com/xbib/xbib/blob/master/content/src/main/ja
va/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an
ES-based Linked Data Platform

http://www.w3.org/TR/ldp/

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Hi,

I think you will have to preprocess documents on your side first
and then push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level
serialization od RDF data IMO not optimal for ES indexing. May be better
would be to find some RDF-OOM tool and have your RDF documents mapped to
Java POJOs and serialize POJOs into JSONs instead (you can use Jackson
library for that for example). This will give you better control over whole
RDF -> JSON conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo a...@datavolution.com
wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic
question or if it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that
documents are individually referenced (as opposed to the entire json-ld
file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb
1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%
3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZm
Tcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNt
TAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1a4c20a6-a215-42b8-bf11-350b766b508a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1a4c20a6-a215-42b8-bf11-350b766b508a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEJqXDGX2xNqCi34CQu-q_V2OZHkNx2t5FEBuSLQmXzdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(westurner) #13

On Saturday, September 27, 2014 at 9:23:22 PM UTC-5, abo wrote:

On a side note, looks like the rdflib-jsonld solution already has support
for XSD literals and lists, so perhaps it could be extended to map directly
into ES _type if that is a good direction.

I've started to generate initial type mappings from ElasticSearch to
JSON-LD and XSD here:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/097dfd09-929c-44de-be7f-22666dd4d03a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(westurner) #14

Thanks!

This would be good information for a wiki or documentation
ElasticSearch/JSON-LD best-practices page

On Saturday, September 27, 2014 at 11:24:24 AM UTC-5, Jörg Prante wrote:

For the _mapping, I think about two more types for that I intend to write
ES type mappers, "iri" and "literal", so ES can receive XSD data types and
language codes and map them to fields / analyzers. IRIs are just opaque
strings but they can be shortened if prefix is configured and can be used
as _id or for referencing to an _id.

Instead of _mapping I prefer the thought about handling @contexts like
template documents.

Not sure about the best way to manage JSON-LD. There are two approaches:
save a JSON-LD (you say original document) beside other versions. This
requires more space and I'm not sure about the purpose of the original
JSON-LD. The other approach is more about dropping original JSON-LD after
parsing it to triples and store the triples in an ES JSON doc which is a
surrogate close to JSON-LD but arranges with all the JSON dialect
characteristics of the ES document DSL.

I'm not in scala, so I can not promise much, but happy about glimpsing all
related code!

Jörg

On Sat, Sep 27, 2014 at 4:50 PM, Alfredo Serafini <ser...@gmail.com
<javascript:>> wrote:

HI Jorg Indeed! :slight_smile:

What I like about _mapping is that they are managed as documents too, and
they can be:

  1. automatically inferred from data (at risk, but useful)
  2. provided by static files, in some cases
  3. managed for _index/_types

all those things could be done with something like a _context (which will
include at first a single @context). The first point should probably be
avoided at all for json-ld :-), but it should be possible.

But we may need more @context items for a single "resource" schema
(referring to _index/_type), and in perspective it's even possible to
re-use a @context for different _index/_type pairs.
Furthermore: when exposing results in jsonld one might want to reference
an external @context and merge it before providing results, and In my
opinion the more "risky" part is when input the original json-ld, if we
want to flat it and extract the @context which will permits us to
recostruct later the original document.
Given the fact that it could be possible to map every kind of json
results from ES, documents imported as jsonld might has to maintain at
least the original fields.

I'd like to put some code on github and if you want we could join the
effort on that? I'm working mostly on scala at the moment. What do you
think about?

Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha
scritto:

Absolutely. My thought is about managing one (or more) context ES JSON
document(s) where all the @context definitions of an index live. A format
plugin can then process search results and converts ES JSON to expanded
JSON-LD and from there to other RDF serializations.

Jörg

On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini ser...@gmail.com
wrote:

Hi

using json-ld is indeed rather simple, as it is JSON, and then it's
even possible to index it as is.
I'm currently using ES for storing RDF documents in json-ld on a
specific index: in that case one can simply use the uri as an _id, recover
the full original format by _source, and use basic search capabilities on
the index, if escaping / nesting it's not a big deal.

However, in order to use resource with some more flexibility, I think
the best would be index them as "flat" as possible, then use an ad-hoc
@context on the ES json to obtain again the original json-ld.
This would be my ideal usage at the moment: seems complex at first, but
it's not, I'm currently experimenting in saving @context for a _type,
obtaining let's say a sort of _context, similar to a _mapping, to
reconstruct the original semantics.
If someone likes the idea, I'd like to share thoughts on that :slight_smile:

Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha
scritto:

Lukáš,

of course you are right, RDF/XML looks complex and requires parsing.
The underlying principle of all RDF is a graph (or a series of triples in
form of subject/predicate/object, where the triple series is a
serialization of the graph), So the challenge is first the parsing of RDF
input, and second, constructing the model, and third, serializing the model
to an ES-friendly input (here: JSON-LD, sort of). RDF ensures that there is
a single model for all serializations.

This technical perspective does not necessarily solve all challenges
that are inherent to the chosen data model. For example, nested resources
in RDF. It might be feasible to flatten nested resource by their
identifiers and generate one JSON after the other. Or it could be feasible
to keep nested resources intact and wrap them into nested structures in a
single ES JSON object.

In my data model, I can map RDF subject IDs to ES doc IDs. Other data
models may prefer other approaches to select ES doc IDs.

Jörg

On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Jörg,

my concern is that RDF/XML allow to express one thing in several
ways. For example, if you take FOAF specification then there are several
ways how you can express that one Person knows other Person. One way it
using reference IDs other way it using nested Person inside other Person.
See [1] for examples. My understanding is that although both ways express
exactly the same information they lead to different XML representation and
thus to different JSON-LD. Not that you can push such data in ES but I
wonder if you can then have any consistent way of querying such data.

May be there is some way how you can preprocess XML document and
convert all nested Persons to references (would require arbitrary ID
construction?). Or something similar. Though I am not sure this would be
generally applicable approach to any RDF data.

[1] http://www.xml.com/pub/a/2004/02/04/foaf.html

Regards,
Lukas

On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com <
joerg...@gmail.com> wrote:

JSON-LD is perfect for ES indexing, as long as you use the "compact"
form of representation.

http://www.w3.org/TR/json-ld-api/#compaction-algorithms

Example:

https://github.com/lanthaler/JsonLD/blob/master/Test/Fixture
s/sample-compacted.jsonld

This means you should use short field names and shorten IRIs to a
prefix form. This gives a convenient mapping to ES field names (e.g.
"dc:title" or "dc:creator"). The '@' fields can also be indexed and they do
not control anything special in ES (some @id may be mapped to ES _id but
for nested structures this does not match)

I use my own RDF API and transform RDF graphs (so not only JSON-LD
but also other formats like N-Triples and RDF/XML) into XContent using this
method:

https://github.com/xbib/xbib/blob/master/content/src/main/ja
va/org/xbib/rdf/content/DefaultResourceContentBuilder.java

I plan to extend this content building by interpreting rdf:type and
rdf:list etc. to generate correct ES JSON objects and arrays. There is also
an amount of work left to do for the plethora of XSD types in RDF literals
or for language tags.

This will be subsumed into an RDF input/output plugin for an
ES-based Linked Data Platform

http://www.w3.org/TR/ldp/

but there is no ETA yet.

Jörg

On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček lukas...@gmail.com
wrote:

Hi,

I think you will have to preprocess documents on your side first
and then push into ES individually (you can push in batch).

As a side note, I would say json-ld is quite low level
serialization od RDF data IMO not optimal for ES indexing. May be better
would be to find some RDF-OOM tool and have your RDF documents mapped to
Java POJOs and serialize POJOs into JSONs instead (you can use Jackson
library for that for example). This will give you better control over whole
RDF -> JSON conversion process.

Regards,
Lukas

On Thu, Sep 25, 2014 at 7:21 PM, abo a...@datavolution.com wrote:

Hello,

I'm new to Elasticsearch, so forgive me if this is a basic
question or if it's in some documentation that I haven't read...

I am trying to load a json-ld file into ES. The json-ld file was
generated from an RDF file, using Jena. The structure starts with:

{
"@graph" :

followed by the individual "documents", each with:

{
"@id" :

and a variable number of parameters in each.

My question is how do I load this into ES and ensure that
documents are individually referenced (as opposed to the entire json-ld
file)?

Do I need to doctor this json-ld file further in order to load it?

Thanks for your help.

-- abo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb
1-4c50-96c4-8f586e1e0807%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%
3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZm
Tcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45
X7EGTEyc2bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ca2bcbf8-13af-4516-b7ac-9c953c803834%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adhishankar) #15

I am new in pushing JSON file from Logstash to Elasticsearch.

Could someone assist me pushing JSON file from Logstash to Elasticsearch ..Please share some sample logstash working configuration file for JSON..

Thanks


(Tamás Ficand) #16

Hello everybody and Dear Jörg!

I wonder if you could send me the link to your java application that enables users to load json-ld files to Elasticsearch. The github link that you provided on the ES forum ( Loading JSON-LD into ES ) does not work.

https://github.com/xbib/xbib/blob/master/content/src/main/java/org/xbib/rdf/content/DefaultResourceContentBuilder.java

Many thanks in advance!


(Jörg Prante) #17

Dear Tamás,

I have refactored the xbib application into smaller pieces. The classes mentioned for RDF are refactored here

A unit test for Turtle RDF is at

where the routine for indexing into Elastiscearch is missing (pushing the content builder string into an index is straightforward).

Because the sources I process are only Turtle, N-Triples, and sometimes RDF/XML, a specific routine for JSON-LD is missing.

If you have a JSON-LD file for demonstration, I could try to write a test case for that.


(Tamás Ficand) #18

Dear Jörg!

Thanks for responding, I found another solution to the problem.


(Brianmay01) #19

JSON-LD is perfect for ES indexing with "compact" form
of representation.
Try this; jSON Formatter


(Priti) #20

Hello Tamas,

Could please share me the approach you are following to load Json-LD file into your Elasticsearch.

Thanking you in advance.

Regards,
priti