Implementing a plugin to process the whole input document

Jakub_Kotowski · May 23, 2014, 2:48pm

Hello all,

we are trying to implement a SIREn plugin for ElasticSearch for indexing
and querying documents. We already implemented a version which uses SIREn
to index and query a specific field (called "contents" below) which
contains a JSON document as a string. An example of a doc:

{
"id":3,
"contents":
"{"title":"This is an another article about SIREn.","content":"bla bla bla "}"
}

Instead, we would like to index the whole document as it is posted to
ElasticSearch to avoid the need for a special loader that transforms an
input JSON to the required form. So then the user would simply post a
document such as:

{
"id":3,
"title":"This is an another article about SIREn.",
"content": "bla bla bla "
}

and it would be indexed as a whole both by ElasticSearch and by the SIREn
plugin.

One problem we encountered is that it is not possible to use copyTo for the
_source field and then only configure an analyzer for the copy.

It seems that the cleanest solution would be to modify the
SourceFieldMapper class to allow copyTo.

As a workaround we are going to create a class that extends
SourceFieldMapper and set copyTo for the _source field to a new field that
will be then used for SIREn and register it as follows:

mapperService.documentMapperParser().putRootTypeParser("_source", new
ModifiedSourceFieldMapper.TypeParser());

Does it sound OK or is there a simpler/cleaner solution?

Thank you in advance,

Jakub

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · May 23, 2014, 3:24pm

Not sure what the plugin is doing, but if you want to process dedicated
JSON data in an ES document, you could prepare an analyzer for a new field
type. So user can assign special meaning in the mapping to a field of their
preference.

E.g. a mapping with

 "mappings: {
     "mycontent" : { "type" : "siren" }
}

and a given document would look like

"mycontent" : {
     "title" : "foo",
     "name" : "bar"
     ...
}

and then you could extract the whole JSON subdoc from the doc under
"mycontent" into your analyzer plugin and process it.

For an example, you could look into plugins like the StandardNumber
analyzer, where I defined a new type "standardnumber" for analysis:

github.com

jprante/elasticsearch-analysis-standardnumber/blob/master/src/main/java/org/xbib/elasticsearch/index/mapper/standardnumber/StandardNumberMapper.java

/*
 * Copyright (C) 2014 Jörg Prante
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU Affero General Public License as published
 * by the Free Software Foundation; either version 3 of the License, or
 * (at your option) any later version.
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU Affero General Public License for more details.
 *
 * You should have received a copy of the GNU Affero General Public License
 * along with this program; if not, see http://www.gnu.org/licenses
 * or write to the Free Software Foundation, Inc., 51 Franklin Street,
 * Fifth Floor, Boston, MA 02110-1301 USA.
 *
 * The interactive user interfaces in modified source and object code
 * versions of this program must display Appropriate Legal Notices,
 * as required under Section 5 of the GNU Affero General Public License.

This file has been truncated. show original

Jörg

On Fri, May 23, 2014 at 4:48 PM, Jakub Kotowski jakub@sindicetech.comwrote:

Hello all,

we are trying to implement a SIREn plugin for Elasticsearch for indexing
and querying documents. We already implemented a version which uses SIREn
to index and query a specific field (called "contents" below) which
contains a JSON document as a string. An example of a doc:

{
"id":3,
"contents":
"{"title":"This is an another article about SIREn.","content":"bla bla bla "}"
}

Instead, we would like to index the whole document as it is posted to
Elasticsearch to avoid the need for a special loader that transforms an
input JSON to the required form. So then the user would simply post a
document such as:

{
"id":3,
"title":"This is an another article about SIREn.",
"content": "bla bla bla "
}

and it would be indexed as a whole both by Elasticsearch and by the SIREn
plugin.

One problem we encountered is that it is not possible to use copyTo for
the _source field and then only configure an analyzer for the copy.

It seems that the cleanest solution would be to modify the
SourceFieldMapper class to allow copyTo.

As a workaround we are going to create a class that extends
SourceFieldMapper and set copyTo for the _source field to a new field that
will be then used for SIREn and register it as follows:

mapperService.documentMapperParser().putRootTypeParser("_source", new
ModifiedSourceFieldMapper.TypeParser());

Does it sound OK or is there a simpler/cleaner solution?

Thank you in advance,

Jakub

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF1jgfGYtfkm7A6rqMCJzGyw8OtZraxhbzyARK_TYhykQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jakub_Kotowski · May 23, 2014, 3:38pm

Hi Jörg,

thanks for the reply. Yes, what you suggest is a way to improve our current
approach so that we can get a subdoc instead of a json encoded in a string
field.

What we would like to achieve is to always be able to process any document
that comes to elasticsearch as a whole, i.e. be it { "title": "my title",
"content" : "my content"} or {"name" : "john", "surname" : "doe"}.

For that we either (1) need to be able to set an analyzer for the whole
input document or (2) set an analyzer for the _source field which already
contains the whole doc or (3) copy the _source field to a normal field,
let's say _siren, and set an analyzer for it.

(1) and (2) seem to be impossible.

So we are exploring option (3) which also seems difficult.

Jakub

On Friday, May 23, 2014 4:24:39 PM UTC+1, Jörg Prante wrote:

Not sure what the plugin is doing, but if you want to process dedicated
JSON data in an ES document, you could prepare an analyzer for a new field
type. So user can assign special meaning in the mapping to a field of their
preference.

E.g. a mapping with
 "mappings: {
     "mycontent" : { "type" : "siren" }
}
and a given document would look like
"mycontent" : {
     "title" : "foo",
     "name" : "bar"
     ...
}
and then you could extract the whole JSON subdoc from the doc under
"mycontent" into your analyzer plugin and process it.

For an example, you could look into plugins like the StandardNumber
analyzer, where I defined a new type "standardnumber" for analysis:

https://github.com/jprante/elasticsearch-analysis-standardnumber/blob/master/src/main/java/org/xbib/elasticsearch/index/mapper/standardnumber/StandardNumberMapper.java

Jörg

On Fri, May 23, 2014 at 4:48 PM, Jakub Kotowski <ja...@sindicetech.com<javascript:>

wrote:

Hello all,

we are trying to implement a SIREn plugin for Elasticsearch for indexing
and querying documents. We already implemented a version which uses SIREn
to index and query a specific field (called "contents" below) which
contains a JSON document as a string. An example of a doc:

{
"id":3,
"contents":
"{"title":"This is an another article about SIREn.","content":"bla bla bla "}"
}

Instead, we would like to index the whole document as it is posted to
Elasticsearch to avoid the need for a special loader that transforms an
input JSON to the required form. So then the user would simply post a
document such as:

{
"id":3,
"title":"This is an another article about SIREn.",
"content": "bla bla bla "
}

and it would be indexed as a whole both by Elasticsearch and by the SIREn
plugin.

One problem we encountered is that it is not possible to use copyTo for
the _source field and then only configure an analyzer for the copy.

It seems that the cleanest solution would be to modify the
SourceFieldMapper class to allow copyTo.

As a workaround we are going to create a class that extends
SourceFieldMapper and set copyTo for the _source field to a new field that
will be then used for SIREn and register it as follows:

mapperService.documentMapperParser().putRootTypeParser("_source", new
ModifiedSourceFieldMapper.TypeParser());

Does it sound OK or is there a simpler/cleaner solution?

Thank you in advance,

Jakub

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · May 23, 2014, 5:56pm

In answer to (1), in each custom mapper, you have access to ParseContext in
the method

public void parse(ParseContext context) throws IOException

In the ParseContext, you can access _source with the source() method to do
whatever you want, e.g. copy it, parse it, index it again etc.

(2) is a slight misconception, since _source is not a field, but a "field
container", it is a byte array passed through the ES API so the field
mappers can do their work.

(3) as said, it is possible to copy _source, but only internally in the
code of a custom field mapper, not by configuration in the mapping, since
_source is reserved for special treatment inside ES and users should not be
able to tamper with it.

So a customized mapper in a plugin could work like this in the root object:

"mappings" : {
"properties" : {
...
"_siren" : { "type" : "siren" }
}
}

and in the corresponding code in the custom mapper, when field _siren is
processed because of the type "siren", it copies the byte array from
_source in the ParseContext. (It need not to be the field name _siren this
is just an example name)

Jörg

On Fri, May 23, 2014 at 5:38 PM, Jakub Kotowski jakub@sindicetech.comwrote:

Hi Jörg,

thanks for the reply. Yes, what you suggest is a way to improve our
current approach so that we can get a subdoc instead of a json encoded in a
string field.

What we would like to achieve is to always be able to process any document
that comes to elasticsearch as a whole, i.e. be it { "title": "my title",
"content" : "my content"} or {"name" : "john", "surname" : "doe"}.

For that we either (1) need to be able to set an analyzer for the whole
input document or (2) set an analyzer for the _source field which already
contains the whole doc or (3) copy the _source field to a normal field,
let's say _siren, and set an analyzer for it.

(1) and (2) seem to be impossible.

So we are exploring option (3) which also seems difficult.

Jakub

On Friday, May 23, 2014 4:24:39 PM UTC+1, Jörg Prante wrote:
Not sure what the plugin is doing, but if you want to process dedicated
JSON data in an ES document, you could prepare an analyzer for a new field
type. So user can assign special meaning in the mapping to a field of their
preference.

E.g. a mapping with
 "mappings: {
     "mycontent" : { "type" : "siren" }
}
and a given document would look like
"mycontent" : {
     "title" : "foo",
     "name" : "bar"
     ...
}
and then you could extract the whole JSON subdoc from the doc under
"mycontent" into your analyzer plugin and process it.

For an example, you could look into plugins like the StandardNumber
analyzer, where I defined a new type "standardnumber" for analysis:

https://github.com/jprante/elasticsearch-analysis-
standardnumber/blob/master/src/main/java/org/xbib/
elasticsearch/index/mapper/standardnumber/StandardNumberMapper.java

Jörg

On Fri, May 23, 2014 at 4:48 PM, Jakub Kotowski ja...@sindicetech.comwrote:

Hello all,

we are trying to implement a SIREn plugin for Elasticsearch for indexing
and querying documents. We already implemented a version which uses SIREn
to index and query a specific field (called "contents" below) which
contains a JSON document as a string. An example of a doc:

{
"id":3,
"contents":"{"title":"This is an another article
about SIREn.","content":"bla bla bla "}"
}

Instead, we would like to index the whole document as it is posted to
Elasticsearch to avoid the need for a special loader that transforms an
input JSON to the required form. So then the user would simply post a
document such as:

{
"id":3,
"title":"This is an another article about SIREn.",
"content": "bla bla bla "
}

and it would be indexed as a whole both by Elasticsearch and by the
SIREn plugin.

One problem we encountered is that it is not possible to use copyTo for
the _source field and then only configure an analyzer for the copy.

It seems that the cleanest solution would be to modify the
SourceFieldMapper class to allow copyTo.

As a workaround we are going to create a class that extends
SourceFieldMapper and set copyTo for the _source field to a new field that
will be then used for SIREn and register it as follows:

mapperService.documentMapperParser().putRootTypeParser("_source", new
ModifiedSourceFieldMapper.TypeParser());

Does it sound OK or is there a simpler/cleaner solution?

Thank you in advance,

Jakub

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF5cziPGQaNDAZPfr1ZOwY0qc%2BQnQas9gsivfh3pD2O0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jakub_Kotowski · May 23, 2014, 6:17pm

Great, the ParseContext looks promising.

We'll try it and report back, thanks!

Jakub

BTW, just to answer your previous implicit question - SIREn allows for
advanced structured document search, more at http://sirendb.com/

On Friday, May 23, 2014 6:56:32 PM UTC+1, Jörg Prante wrote:

In answer to (1), in each custom mapper, you have access to ParseContext
in the method

public void parse(ParseContext context) throws IOException

In the ParseContext, you can access _source with the source() method to do
whatever you want, e.g. copy it, parse it, index it again etc.

(2) is a slight misconception, since _source is not a field, but a "field
container", it is a byte array passed through the ES API so the field
mappers can do their work.

(3) as said, it is possible to copy _source, but only internally in the
code of a custom field mapper, not by configuration in the mapping, since
_source is reserved for special treatment inside ES and users should not be
able to tamper with it.

So a customized mapper in a plugin could work like this in the root object:

"mappings" : {
"properties" : {
...
"_siren" : { "type" : "siren" }
}
}

and in the corresponding code in the custom mapper, when field _siren is
processed because of the type "siren", it copies the byte array from
_source in the ParseContext. (It need not to be the field name _siren this
is just an example name)

Jörg

On Fri, May 23, 2014 at 5:38 PM, Jakub Kotowski <ja...@sindicetech.com<javascript:>

wrote:
Hi Jörg,

thanks for the reply. Yes, what you suggest is a way to improve our
current approach so that we can get a subdoc instead of a json encoded in a
string field.

What we would like to achieve is to always be able to process any
document that comes to elasticsearch as a whole, i.e. be it { "title": "my
title", "content" : "my content"} or {"name" : "john", "surname" : "doe"}.

For that we either (1) need to be able to set an analyzer for the whole
input document or (2) set an analyzer for the _source field which already
contains the whole doc or (3) copy the _source field to a normal field,
let's say _siren, and set an analyzer for it.

(1) and (2) seem to be impossible.

So we are exploring option (3) which also seems difficult.

Jakub

On Friday, May 23, 2014 4:24:39 PM UTC+1, Jörg Prante wrote:
Not sure what the plugin is doing, but if you want to process dedicated
JSON data in an ES document, you could prepare an analyzer for a new field
type. So user can assign special meaning in the mapping to a field of their
preference.

E.g. a mapping with
 "mappings: {
     "mycontent" : { "type" : "siren" }
}
and a given document would look like
"mycontent" : {
     "title" : "foo",
     "name" : "bar"
     ...
}
and then you could extract the whole JSON subdoc from the doc under
"mycontent" into your analyzer plugin and process it.

For an example, you could look into plugins like the StandardNumber
analyzer, where I defined a new type "standardnumber" for analysis:

https://github.com/jprante/elasticsearch-analysis-
standardnumber/blob/master/src/main/java/org/xbib/
elasticsearch/index/mapper/standardnumber/StandardNumberMapper.java

Jörg

On Fri, May 23, 2014 at 4:48 PM, Jakub Kotowski ja...@sindicetech.comwrote:

Hello all,

we are trying to implement a SIREn plugin for Elasticsearch for
indexing and querying documents. We already implemented a version which
uses SIREn to index and query a specific field (called "contents" below)
which contains a JSON document as a string. An example of a doc:

{
"id":3,
"contents":"{"title":"This is an another article
about SIREn.","content":"bla bla bla "}"
}

Instead, we would like to index the whole document as it is posted to
Elasticsearch to avoid the need for a special loader that transforms an
input JSON to the required form. So then the user would simply post a
document such as:

{
"id":3,
"title":"This is an another article about SIREn.",
"content": "bla bla bla "
}

and it would be indexed as a whole both by Elasticsearch and by the
SIREn plugin.

One problem we encountered is that it is not possible to use copyTo for
the _source field and then only configure an analyzer for the copy.

It seems that the cleanest solution would be to modify the
SourceFieldMapper class to allow copyTo.

As a workaround we are going to create a class that extends
SourceFieldMapper and set copyTo for the _source field to a new field that
will be then used for SIREn and register it as follows:

mapperService.documentMapperParser().putRootTypeParser("_source", new
ModifiedSourceFieldMapper.TypeParser());

Does it sound OK or is there a simpler/cleaner solution?

Thank you in advance,

Jakub

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e30a414b-fb6f-4759-a80f-0e4ac3bf96ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · May 23, 2014, 7:00pm

Do you plan to implement SPARQL endpoint on Elasticsearch?

That would be one wonderful asset missing in my portfolio for supporting
library catalog indexing and search, all I do with RDF and Elasticsearch is
based on JSON-LD.

Jörg

On Fri, May 23, 2014 at 8:17 PM, Jakub Kotowski jakub@sindicetech.comwrote:

Great, the ParseContext looks promising.

We'll try it and report back, thanks!

Jakub

BTW, just to answer your previous implicit question - SIREn allows for
advanced structured document search, more at http://sirendb.com/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEHrdguj%2BMLCXX%2BBwS%2BNBXcWKXAx4C25V-ZzC_5RCqKcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jakub_Kotowski · May 27, 2014, 4:47pm

It took quite a bit of figuring out but we succeeded registering a
rootMapper and using the ParseContext, thanks.

We don't have any plans for implementing SPARQL on top of Elasticsearch.
Siren can do joins which perform better than blockjoins, especially for
deep nesting, but it still is a different paradigm.

On the other hand, we are always interested in new use cases. We've done
work on indexing richly structured documents with lots of structure and
lots of text content too and querying where you need to combine querying
structure and text. Maybe that is something relevant to library catalog
indexing?

In any case, I'd be happy to hear more if you want to reply offline.

Jakub

On Friday, May 23, 2014 8:00:51 PM UTC+1, Jörg Prante wrote:

Do you plan to implement SPARQL endpoint on Elasticsearch?

That would be one wonderful asset missing in my portfolio for supporting
library catalog indexing and search, all I do with RDF and Elasticsearch is
based on JSON-LD.

Jörg

On Fri, May 23, 2014 at 8:17 PM, Jakub Kotowski <ja...@sindicetech.com<javascript:>

wrote:

Great, the ParseContext looks promising.

We'll try it and report back, thanks!

Jakub

BTW, just to answer your previous implicit question - SIREn allows for
advanced structured document search, more at http://sirendb.com/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e4168857-a854-4119-abae-58d4cbbbaf1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · May 27, 2014, 10:56pm

Yes, it is (not only) relevant to library catalog indexing, because
Bibframe, a new project by Library of Congress, is built on RDF, and
next-generation library systems will embrace W3C semantic web technologies.

The RDF data I generate is indexed in JSON-LD format into Elasticsearch but
for SPARQL queries, it has to be stored in triple stores. SPARQL is one
missing thing I can not do with Elasticsearch, just querying JSON-LD, which
is good for a number of use cases, but not for "pure" semantic web
applications.

Projects like Jena Text try to connect RDF and Lucene/Solr
Apache Jena - Jena Full Text Search by other means,
by adding "full text searches" as a language property into a SPARQL dialect.

But I wonder if a SPARQL engine could run a subset of SPARQL on a JSON-LD
doc store like Elasticsearch, by translating SPARQL to ES DSL queries. Some
queries won't run that fast, but that wouldn't hurt for a demonstration - I
admit the idea is a very rough sketch, but replacing triple store by ES
would be intriguing.

Comments, ideas, criticism etc. is welcome.

Jörg

On Tue, May 27, 2014 at 6:47 PM, Jakub Kotowski jakub@sindicetech.comwrote:

It took quite a bit of figuring out but we succeeded registering a
rootMapper and using the ParseContext, thanks.

We don't have any plans for implementing SPARQL on top of Elasticsearch.
Siren can do joins which perform better than blockjoins, especially for
deep nesting, but it still is a different paradigm.

On the other hand, we are always interested in new use cases. We've done
work on indexing richly structured documents with lots of structure and
lots of text content too and querying where you need to combine querying
structure and text. Maybe that is something relevant to library catalog
indexing?

In any case, I'd be happy to hear more if you want to reply offline.

Jakub

On Friday, May 23, 2014 8:00:51 PM UTC+1, Jörg Prante wrote:

Do you plan to implement SPARQL endpoint on Elasticsearch?

That would be one wonderful asset missing in my portfolio for supporting
library catalog indexing and search, all I do with RDF and Elasticsearch is
based on JSON-LD.

Jörg

On Fri, May 23, 2014 at 8:17 PM, Jakub Kotowski ja...@sindicetech.comwrote:

Great, the ParseContext looks promising.

We'll try it and report back, thanks!

Jakub

BTW, just to answer your previous implicit question - SIREn allows for
advanced structured document search, more at http://sirendb.com/

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e4168857-a854-4119-abae-58d4cbbbaf1a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/e4168857-a854-4119-abae-58d4cbbbaf1a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEd0zSuN0WHy3a2U5Dwx6atGnhuWwmjq9pwDBKdG2z70A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jakub_Kotowski · June 12, 2014, 9:53pm

Hi again,

I have a followup question about the ParseContext and processing documents
before indexing.

Now I would need to modify a document before it is parsed by Elasticsearch.

I tried to do it by modifying context.source() but that leads to a corrupt
index. I guess that's because context.parser is also initialized with the
same bytearray (at least w.r.t. its contents) as context.source(). So in
order to mutate the bytearray, I would need to do it in the parser too. The
parser however is already started and, by the time I get to it, it already
processed at least two tokens. That means that I could conceivably try to
restart the parser with the modified bytearray and lead it to the same
(resp. corresponding) state as it originally would have been thanks to
ObjectMapper's actions on it. This is would however very clearly be a very
fragile hack... One way of avoiding that maybe could be somehow achieving
to be the first rootMapper executed by the ObjectMapper, but I think this
is hardcoded and cannot be easily changed (there's no client API for it
afaik).

Is there some way of modifying a document before Elasticsearch gets to
parse it?

Basically, I need to send a document to ES that contains some JSON
subobjects understood by the custom parser of our plugin and it doesn't
make much sense for Elasticsearch to index them as they are so ideally we
would like to transform them a bit.

Thanks for any pointers.

Jakub

On Friday, May 23, 2014 6:56:32 PM UTC+1, Jörg Prante wrote:

In answer to (1), in each custom mapper, you have access to ParseContext
in the method

public void parse(ParseContext context) throws IOException

In the ParseContext, you can access _source with the source() method to do
whatever you want, e.g. copy it, parse it, index it again etc.

(2) is a slight misconception, since _source is not a field, but a "field
container", it is a byte array passed through the ES API so the field
mappers can do their work.

(3) as said, it is possible to copy _source, but only internally in the
code of a custom field mapper, not by configuration in the mapping, since
_source is reserved for special treatment inside ES and users should not be
able to tamper with it.

So a customized mapper in a plugin could work like this in the root object:

"mappings" : {
"properties" : {
...
"_siren" : { "type" : "siren" }
}
}

and in the corresponding code in the custom mapper, when field _siren is
processed because of the type "siren", it copies the byte array from
_source in the ParseContext. (It need not to be the field name _siren this
is just an example name)

Jörg

On Fri, May 23, 2014 at 5:38 PM, Jakub Kotowski <ja...@sindicetech.com
<javascript:>> wrote:
Hi Jörg,

thanks for the reply. Yes, what you suggest is a way to improve our
current approach so that we can get a subdoc instead of a json encoded in a
string field.

What we would like to achieve is to always be able to process any
document that comes to elasticsearch as a whole, i.e. be it { "title": "my
title", "content" : "my content"} or {"name" : "john", "surname" : "doe"}.

For that we either (1) need to be able to set an analyzer for the whole
input document or (2) set an analyzer for the _source field which already
contains the whole doc or (3) copy the _source field to a normal field,
let's say _siren, and set an analyzer for it.

(1) and (2) seem to be impossible.

So we are exploring option (3) which also seems difficult.

Jakub

On Friday, May 23, 2014 4:24:39 PM UTC+1, Jörg Prante wrote:
Not sure what the plugin is doing, but if you want to process dedicated
JSON data in an ES document, you could prepare an analyzer for a new field
type. So user can assign special meaning in the mapping to a field of their
preference.

E.g. a mapping with
 "mappings: {
     "mycontent" : { "type" : "siren" }
}
and a given document would look like
"mycontent" : {
     "title" : "foo",
     "name" : "bar"
     ...
}
and then you could extract the whole JSON subdoc from the doc under
"mycontent" into your analyzer plugin and process it.

For an example, you could look into plugins like the StandardNumber
analyzer, where I defined a new type "standardnumber" for analysis:

https://github.com/jprante/elasticsearch-analysis-
standardnumber/blob/master/src/main/java/org/xbib/
elasticsearch/index/mapper/standardnumber/StandardNumberMapper.java

Jörg

On Fri, May 23, 2014 at 4:48 PM, Jakub Kotowski ja...@sindicetech.com
wrote:

Hello all,

we are trying to implement a SIREn plugin for Elasticsearch for
indexing and querying documents. We already implemented a version which
uses SIREn to index and query a specific field (called "contents" below)
which contains a JSON document as a string. An example of a doc:

{
"id":3,
"contents":"{"title":"This is an another article
about SIREn.","content":"bla bla bla "}"
}

Instead, we would like to index the whole document as it is posted to
Elasticsearch to avoid the need for a special loader that transforms an
input JSON to the required form. So then the user would simply post a
document such as:

{
"id":3,
"title":"This is an another article about SIREn.",
"content": "bla bla bla "
}

and it would be indexed as a whole both by Elasticsearch and by the
SIREn plugin.

One problem we encountered is that it is not possible to use copyTo for
the _source field and then only configure an analyzer for the copy.

It seems that the cleanest solution would be to modify the
SourceFieldMapper class to allow copyTo.

As a workaround we are going to create a class that extends
SourceFieldMapper and set copyTo for the _source field to a new field that
will be then used for SIREn and register it as follows:

mapperService.documentMapperParser().putRootTypeParser("_source", new
ModifiedSourceFieldMapper.TypeParser());

Does it sound OK or is there a simpler/cleaner solution?

Thank you in advance,

Jakub

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89d75c30-5aa5-49e5-a17f-90f9b38829fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · June 12, 2014, 10:39pm

Short answer: modifying the source after having executed a standard index
or bulk action is not possible.

Long answer: it depends, if you look at
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/index/TransportIndexAction.java#L188

you can see how index (or bulk index, which uses same code fragment) action
depends on the source as a byte ref which is passed around through all the
mapping/analyzer phase. It is absolutely required that _source field
represents the document of what the index analyzer produced from it.

But it's not a serious limitation. In fact, it is a good thing that no ES
user knows an easy way how to tamper with _source data and open the box to
all kinds of mysterious bugs just by installing (maybe malevolent) plugins
and change the way the standard actions are expected to work.

Of course, it is possible in plugin code to write a new action (plus
another REST action endpoint) which works similar to index/bulk action but
can also address the additional _source modification you want.

For example, I have implemented another bulk-style action in ES which works
with a different BulkProcessor class and has another style of error
handling. Not possible by modifying the existing bulk action, but by adding
another bulk action.

Jörg

On Thu, Jun 12, 2014 at 11:53 PM, Jakub Kotowski jakub@sindicetech.com
wrote:

Hi again,

I have a followup question about the ParseContext and processing documents
before indexing.

Now I would need to modify a document before it is parsed by Elasticsearch.

I tried to do it by modifying context.source() but that leads to a corrupt
index. I guess that's because context.parser is also initialized with the
same bytearray (at least w.r.t. its contents) as context.source(). So in
order to mutate the bytearray, I would need to do it in the parser too. The
parser however is already started and, by the time I get to it, it already
processed at least two tokens. That means that I could conceivably try to
restart the parser with the modified bytearray and lead it to the same
(resp. corresponding) state as it originally would have been thanks to
ObjectMapper's actions on it. This is would however very clearly be a very
fragile hack... One way of avoiding that maybe could be somehow achieving
to be the first rootMapper executed by the ObjectMapper, but I think this
is hardcoded and cannot be easily changed (there's no client API for it
afaik).

Is there some way of modifying a document before Elasticsearch gets to
parse it?

Basically, I need to send a document to ES that contains some JSON
subobjects understood by the custom parser of our plugin and it doesn't
make much sense for Elasticsearch to index them as they are so ideally we
would like to transform them a bit.

Thanks for any pointers.

Jakub

On Friday, May 23, 2014 6:56:32 PM UTC+1, Jörg Prante wrote:
In answer to (1), in each custom mapper, you have access to ParseContext
in the method

public void parse(ParseContext context) throws IOException

In the ParseContext, you can access _source with the source() method to
do whatever you want, e.g. copy it, parse it, index it again etc.

(2) is a slight misconception, since _source is not a field, but a "field
container", it is a byte array passed through the ES API so the field
mappers can do their work.

(3) as said, it is possible to copy _source, but only internally in the
code of a custom field mapper, not by configuration in the mapping, since
_source is reserved for special treatment inside ES and users should not be
able to tamper with it.

So a customized mapper in a plugin could work like this in the root
object:

"mappings" : {
"properties" : {
...
"_siren" : { "type" : "siren" }
}
}

and in the corresponding code in the custom mapper, when field _siren is
processed because of the type "siren", it copies the byte array from
_source in the ParseContext. (It need not to be the field name _siren this
is just an example name)

Jörg

On Fri, May 23, 2014 at 5:38 PM, Jakub Kotowski ja...@sindicetech.com
wrote:
Hi Jörg,

thanks for the reply. Yes, what you suggest is a way to improve our
current approach so that we can get a subdoc instead of a json encoded in a
string field.

What we would like to achieve is to always be able to process any
document that comes to elasticsearch as a whole, i.e. be it { "title": "my
title", "content" : "my content"} or {"name" : "john", "surname" : "doe"}.

For that we either (1) need to be able to set an analyzer for the whole
input document or (2) set an analyzer for the _source field which already
contains the whole doc or (3) copy the _source field to a normal field,
let's say _siren, and set an analyzer for it.

(1) and (2) seem to be impossible.

So we are exploring option (3) which also seems difficult.

Jakub

On Friday, May 23, 2014 4:24:39 PM UTC+1, Jörg Prante wrote:
Not sure what the plugin is doing, but if you want to process dedicated
JSON data in an ES document, you could prepare an analyzer for a new field
type. So user can assign special meaning in the mapping to a field of their
preference.

E.g. a mapping with
 "mappings: {
     "mycontent" : { "type" : "siren" }
}
and a given document would look like
"mycontent" : {
     "title" : "foo",
     "name" : "bar"
     ...
}
and then you could extract the whole JSON subdoc from the doc under
"mycontent" into your analyzer plugin and process it.

For an example, you could look into plugins like the StandardNumber
analyzer, where I defined a new type "standardnumber" for analysis:

https://github.com/jprante/elasticsearch-analysis-standardnu
mber/blob/master/src/main/java/org/xbib/elasticsearch/index/mapper/
standardnumber/StandardNumberMapper.java

Jörg

On Fri, May 23, 2014 at 4:48 PM, Jakub Kotowski ja...@sindicetech.com
wrote:

Hello all,

we are trying to implement a SIREn plugin for Elasticsearch for
indexing and querying documents. We already implemented a version which
uses SIREn to index and query a specific field (called "contents" below)
which contains a JSON document as a string. An example of a doc:

{
"id":3,
"contents":"{"title":"This is an another article abou
t SIREn.","content":"bla bla bla "}"
}

Instead, we would like to index the whole document as it is posted to
Elasticsearch to avoid the need for a special loader that transforms an
input JSON to the required form. So then the user would simply post a
document such as:

{
"id":3,
"title":"This is an another article about SIREn.",
"content": "bla bla bla "
}

and it would be indexed as a whole both by Elasticsearch and by the
SIREn plugin.

One problem we encountered is that it is not possible to use copyTo
for the _source field and then only configure an analyzer for the copy.

It seems that the cleanest solution would be to modify the
SourceFieldMapper class to allow copyTo.

As a workaround we are going to create a class that extends
SourceFieldMapper and set copyTo for the _source field to a new field that
will be then used for SIREn and register it as follows:

mapperService.documentMapperParser().putRootTypeParser("_source", new
ModifiedSourceFieldMapper.TypeParser());

Does it sound OK or is there a simpler/cleaner solution?

Thank you in advance,

Jakub

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/89d75c30-5aa5-49e5-a17f-90f9b38829fa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/89d75c30-5aa5-49e5-a17f-90f9b38829fa%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGRJ4VsjVqt-F8F4Hr%2BQ_tyHAmGQdvVhGfoZqkfj-48Ww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jakub_Kotowski · June 13, 2014, 9:22am

Hi Jörg,

it makes sense. I guess we'll write a new REST action later as you suggest.

Thanks,

Jakub

On 12/06/14 23:39, joergprante@gmail.com wrote:

Short answer: modifying the source after having executed a standard
index or bulk action is not possible.

Long answer: it depends, if you look
at https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/index/TransportIndexAction.java#L188

you can see how index (or bulk index, which uses same code fragment)
action depends on the source as a byte ref which is passed around
through all the mapping/analyzer phase. It is absolutely required that
_source field represents the document of what the index analyzer
produced from it.

But it's not a serious limitation. In fact, it is a good thing that no
ES user knows an easy way how to tamper with _source data and open the
box to all kinds of mysterious bugs just by installing (maybe
malevolent) plugins and change the way the standard actions are
expected to work.

Of course, it is possible in plugin code to write a new action (plus
another REST action endpoint) which works similar to index/bulk action
but can also address the additional _source modification you want.

For example, I have implemented another bulk-style action in ES which
works with a different BulkProcessor class and has another style of
error handling. Not possible by modifying the existing bulk action,
but by adding another bulk action.

Jörg

On Thu, Jun 12, 2014 at 11:53 PM, Jakub Kotowski
<jakub@sindicetech.com mailto:jakub@sindicetech.com> wrote:

Hi again,

I have a followup question about the ParseContext and processing
documents before indexing. 

Now I would need to modify a document before it is parsed by
ElasticSearch.

I tried to do it by modifying context.source() but that leads to a
corrupt index. I guess that's because context.parser is also
initialized with the same bytearray (at least w.r.t. its contents)
as context.source(). So in order to mutate the bytearray, I would
need to do it in the parser too. The parser however is already
started and, by the time I get to it, it already processed at
least two tokens. That means that I could conceivably try to
restart the parser with the modified bytearray and lead it to the
same (resp. corresponding) state as it originally would have been
thanks to ObjectMapper's actions on it. This is would however very
clearly be a very fragile hack... One way of avoiding that maybe
could be somehow achieving to be the first rootMapper executed by
the ObjectMapper, but I think this is hardcoded and cannot be
easily changed (there's no client API for it afaik).

Is there some way of modifying a document before ElasticSearch
gets to parse it?

Basically, I need to send a document to ES that contains some JSON
subobjects understood by the custom parser of our plugin and it
doesn't make much sense for ElasticSearch to index them as they
are so ideally we would like to transform them a bit.

Thanks for any pointers.

Jakub



On Friday, May 23, 2014 6:56:32 PM UTC+1, Jörg Prante wrote:

    In answer to (1), in each custom mapper, you have access to
    ParseContext in the method

    publicvoidparse(ParseContextcontext)throwsIOException

    In the ParseContext, you can access _source with the source()
    method to do whatever you want, e.g. copy it, parse it, index
    it again etc.

    (2) is a slight misconception, since _source is not a field,
    but a "field container", it is a byte array passed through the
    ES API so the field mappers can do their work.

    (3) as said, it is possible to copy _source, but only
    internally in the code of a custom field mapper, not by
    configuration in the mapping, since _source is reserved for
    special treatment inside ES and users should not be able to
    tamper with it.

    So a customized mapper in a plugin could work like this in the
    root object:

     "mappings" : {
          "properties" : {
               ...
               "_siren" : { "type" : "siren" }
          }
    }

    and in the corresponding code in the custom mapper, when field
    _siren is processed because of the type "siren", it copies the
    byte array from _source in the ParseContext. (It need not to
    be the field name _siren this is just an example name)

    Jörg




    On Fri, May 23, 2014 at 5:38 PM, Jakub Kotowski
    <ja...@sindicetech.com> wrote:

        Hi Jörg,

        thanks for the reply. Yes, what you suggest is a way to
        improve our current approach so that we can get a subdoc
        instead of a json encoded in a string field.

        What we would like to achieve is to always be able to
        process any document that comes to elasticsearch as a
        whole, i.e. be it { "title": "my title", "content" : "my
        content"} or {"name" : "john", "surname" : "doe"}.

        For that we either (1) need to be able to set an analyzer
        for the whole input document or (2) set an analyzer for
        the _source field which already contains the whole doc or
        (3) copy the _source field to a normal field, let's say
        _siren, and set an analyzer for it.

        (1) and (2) seem to be impossible.

        So we are exploring option (3) which also seems difficult.

        Jakub 


        On Friday, May 23, 2014 4:24:39 PM UTC+1, Jörg Prante wrote:

            Not sure what the plugin is doing, but if you want to
            process dedicated JSON data in an ES document, you
            could prepare an analyzer for a new field type. So
            user can assign special meaning in the mapping to a
            field of their preference.

            E.g.  a mapping with

                 "mappings: {
                     "mycontent" : { "type" : "siren" }
                }

            and a given document would look like

                "mycontent" : {
                     "title" : "foo",
                     "name" : "bar"
                     ...
                }


            and then you could extract the whole JSON subdoc from
            the doc under "mycontent" into your analyzer plugin
            and process it. 

            For an example, you could look into plugins like the
            StandardNumber analyzer, where I defined a new type
            "standardnumber" for analysis:

            https://github.com/jprante/elasticsearch-analysis-standardnumber/blob/master/src/main/java/org/xbib/elasticsearch/index/mapper/standardnumber/StandardNumberMapper.java

            Jörg



            On Fri, May 23, 2014 at 4:48 PM, Jakub Kotowski
            <ja...@sindicetech.com> wrote:

                Hello all,

                we are trying to implement a SIREn plugin for
                ElasticSearch for indexing and querying documents.
                We already implemented a version which uses SIREn
                to index and query a specific field (called
                "contents" below) which contains a JSON document
                as a string. An example of a doc:

                {
                   "id":3,
                   "contents":"{\"title\":\"This is an another article  about SIREn.\",\"content\":\"bla bla bla \"}"
                }


                Instead, we would like to index the whole document
                as it is posted to ElasticSearch to avoid the need
                for a special loader that transforms an input JSON
                to the required form. So then the user would
                simply post a document such as:

                {
                   "id":3,
                   "title":"This is an another article  about SIREn.",
                   "content": "bla bla bla "
                }

                and it would be indexed as a whole both by
                ElasticSearch and by the SIREn plugin.

                One problem we encountered is that it is not
                possible to use copyTo for the _source field and
                then only configure an analyzer for the copy.

                It seems that the cleanest solution would be to
                modify the SourceFieldMapper class to allow copyTo.

                As a workaround we are going to create a class
                that extends SourceFieldMapper and set copyTo for
                the _source field to a new field that will be then
                used for SIREn and register it as follows:

                mapperService.documentMapperParser().putRootTypeParser("_source",
                new ModifiedSourceFieldMapper.TypeParser());

                Does it sound OK or is there a simpler/cleaner
                solution?

                Thank you in advance,

                Jakub


                -- 
                You received this message because you are
                subscribed to the Google Groups "elasticsearch" group.
                To unsubscribe from this group and stop receiving
                emails from it, send an email to
                elasticsearc...@googlegroups.com.

                To view this discussion on the web visit
                https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com
                <https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com?utm_medium=email&utm_source=footer>.
                For more options, visit
                https://groups.google.com/d/optout.


        -- 
        You received this message because you are subscribed to
        the Google Groups "elasticsearch" group.
        To unsubscribe from this group and stop receiving emails
        from it, send an email to elasticsearc...@googlegroups.com.
        To view this discussion on the web visit
        https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com
        <https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com?utm_medium=email&utm_source=footer>.


        For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com
<mailto:elasticsearch+unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/89d75c30-5aa5-49e5-a17f-90f9b38829fa%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/89d75c30-5aa5-49e5-a17f-90f9b38829fa%40googlegroups.com?utm_medium=email&utm_source=footer>.


For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/f4x8JdMAYdM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com
mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGRJ4VsjVqt-F8F4Hr%2BQ_tyHAmGQdvVhGfoZqkfj-48Ww%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGRJ4VsjVqt-F8F4Hr%2BQ_tyHAmGQdvVhGfoZqkfj-48Ww%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/539AC2BA.40806%40sindicetech.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Introducing SIREn, a plugin for richly nested data search on ElasticSearch Elasticsearch	1	853	July 6, 2017
SIREn plugin for nested documents Elasticsearch	6	845	July 6, 2017
Mapper attachment with indexing a full document Elasticsearch	1	344	July 6, 2017
Indexing HTML documents, problems with JSON Elasticsearch	5	981	July 6, 2017
Elasticsearch join on indexes using SIREN Elasticsearch	1	401	July 12, 2020

Implementing a plugin to process the whole input document

Related topics