Make couchdb river indexing ignore fields

I'm currently indexing a couchdb database using the river. I put a bunch of
documents in to begin with and now that my app is running more documents
are being added. Problem is they have different structures and I'm getting
lots of this error:

[0]: index [archive], type [resume], id
[fdf6ad3dcf0b98d88b7161952f327c209ab045d5], message
[MapperParsingException[object mapping for [resume] tried to parse as
object, but got EOF, has a concrete value been provided to it?]]

I understand this error occurs when a document is trying to be indexed that
has a different type than specified in the mapping. I suspect it's a
object-string conflict. However, this is an issue since we have a "data"
node in the documents that can be anything (depending on the source of the
document) and I need the flexibility to index different types of children
under this "data" node.

So in trying to find a solution, I'm trying to see how I can make ES ignore
the entire data node while it's indexing from couchdb. But when I try to
set the mapping like so:

{
"settings" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
},
"mappings" : {
"resume" : {
"properties" : {
"date" : {"type" : "date", "format" : "yyyy-MM-dd
HH:mm:ss"},
"attributes" : {
"properties" : {
"location" : {
"properties" : {
"latlon" : {"type" : "geo_point"}
}
}
}
},
"data" : {
"type" : "object",
"enabled" : false
}
}
}
}
}

it doesnt seem to matter, the data node and it's contents are all showing
up in the index.

  • So, is there a way setup a mapping to ignore entire objects while
    indexing couchdb?
  • I also read about an option "index.mapping.ignore_malformed". How is this
    option used and would it force my index to ignore the conflicts and index
    the rest of the document? How do I set that in the mapping definition?

Thanks,
Beau

--

"I understand this error occurs when a document is trying to be indexed
that has a different type than specified in the mapping"

meant to say I understand this error occurs when a document is trying to be
indexed that has objects of a different type than specified in the mapping

On Wednesday, November 7, 2012 2:20:32 PM UTC-6, Beau Keogh wrote:

I'm currently indexing a couchdb database using the river. I put a bunch
of documents in to begin with and now that my app is running more documents
are being added. Problem is they have different structures and I'm getting
lots of this error:

[0]: index [archive], type [resume], id
[fdf6ad3dcf0b98d88b7161952f327c209ab045d5], message
[MapperParsingException[object mapping for [resume] tried to parse as
object, but got EOF, has a concrete value been provided to it?]]

I understand this error occurs when a document is trying to be indexed
that has a different type than specified in the mapping. I suspect it's a
object-string conflict. However, this is an issue since we have a "data"
node in the documents that can be anything (depending on the source of the
document) and I need the flexibility to index different types of children
under this "data" node.

So in trying to find a solution, I'm trying to see how I can make ES
ignore the entire data node while it's indexing from couchdb. But when I
try to set the mapping like so:

{
"settings" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
},
"mappings" : {
"resume" : {
"properties" : {
"date" : {"type" : "date", "format" : "yyyy-MM-dd
HH:mm:ss"},
"attributes" : {
"properties" : {
"location" : {
"properties" : {
"latlon" : {"type" : "geo_point"}
}
}
}
},
"data" : {
"type" : "object",
"enabled" : false
}
}
}
}
}

it doesnt seem to matter, the data node and it's contents are all showing
up in the index.

  • So, is there a way setup a mapping to ignore entire objects while
    indexing couchdb?
  • I also read about an option "index.mapping.ignore_malformed". How is
    this option used and would it force my index to ignore the conflicts and
    index the rest of the document? How do I set that in the mapping definition?

Thanks,
Beau

--

There seems to be a bug in elasticsearch where uploading a file with the
attribute "properties" to elasticsearch using a river will result in that
exception. For example, I got the same exception when trying to ensure that
the file server river https://github.com/dadoonet/fsriveruploaded the
following content:
{"properties":"hi"}
I haven't figured out why this is yet.

On Wednesday, November 7, 2012 8:23:27 PM UTC, Beau Keogh wrote:

"I understand this error occurs when a document is trying to be indexed
that has a different type than specified in the mapping"

meant to say I understand this error occurs when a document is trying to
be indexed that has objects of a different type than specified in the
mapping

On Wednesday, November 7, 2012 2:20:32 PM UTC-6, Beau Keogh wrote:

I'm currently indexing a couchdb database using the river. I put a bunch
of documents in to begin with and now that my app is running more documents
are being added. Problem is they have different structures and I'm getting
lots of this error:

[0]: index [archive], type [resume], id
[fdf6ad3dcf0b98d88b7161952f327c209ab045d5], message
[MapperParsingException[object mapping for [resume] tried to parse as
object, but got EOF, has a concrete value been provided to it?]]

I understand this error occurs when a document is trying to be indexed
that has a different type than specified in the mapping. I suspect it's a
object-string conflict. However, this is an issue since we have a "data"
node in the documents that can be anything (depending on the source of the
document) and I need the flexibility to index different types of children
under this "data" node.

So in trying to find a solution, I'm trying to see how I can make ES
ignore the entire data node while it's indexing from couchdb. But when I
try to set the mapping like so:

{
"settings" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
},
"mappings" : {
"resume" : {
"properties" : {
"date" : {"type" : "date", "format" : "yyyy-MM-dd
HH:mm:ss"},
"attributes" : {
"properties" : {
"location" : {
"properties" : {
"latlon" : {"type" : "geo_point"}
}
}
}
},
"data" : {
"type" : "object",
"enabled" : false
}
}
}
}
}

it doesnt seem to matter, the data node and it's contents are all showing
up in the index.

  • So, is there a way setup a mapping to ignore entire objects while
    indexing couchdb?
  • I also read about an option "index.mapping.ignore_malformed". How is
    this option used and would it force my index to ignore the conflicts and
    index the rest of the document? How do I set that in the mapping definition?

Thanks,
Beau

--

Hey Amy,

What is your issue with the FileSystem river?
Feel free to open an issue in Issues · dadoonet/fscrawler · GitHub with
your use case. I will check it.

David.

Le 4 décembre 2012 à 10:18, Amy amyblarney@gmail.com a écrit :

There seems to be a bug in elasticsearch where uploading a file with the
attribute "properties" to elasticsearch using a river will result in that
exception. For example, I got the same exception when trying to ensure that
the file server river https://github.com/dadoonet/fsriver uploaded the
following content:
{"properties":"hi"}
I haven't figured out why this is yet.

On Wednesday, November 7, 2012 8:23:27 PM UTC, Beau Keogh wrote:

"I understand this error occurs when a document is trying to be indexed
that has a different type than specified in the mapping"

meant to say I understand this error occurs when a document is trying to
be indexed that has objects of a different type than specified in the
mapping

On Wednesday, November 7, 2012 2:20:32 PM UTC-6, Beau Keogh wrote:
> > > I'm currently indexing a couchdb database using the river. I put
> > > a bunch of documents in to begin with and now that my app is
> > > running more documents are being added. Problem is they have
> > > different structures and I'm getting lots of this error:

 [0]: index [archive], type [resume], id

[fdf6ad3dcf0b98d88b7161952f327c209ab045d5], message
[MapperParsingException[object mapping for [resume] tried to parse as
object, but got EOF, has a concrete value been provided to it?]]

 I understand this error occurs when a document is trying to be

indexed that has a different type than specified in the mapping. I suspect
it's a object-string conflict. However, this is an issue since we have a
"data" node in the documents that can be anything (depending on the source
of the document) and I need the flexibility to index different types of
children under this "data" node.

 So in trying to find a solution, I'm trying to see how I can make ES

ignore the entire data node while it's indexing from couchdb. But when I
try to set the mapping like so:

 {
     "settings" : {
         "number_of_shards" : 5,
         "number_of_replicas" : 1
     },
     "mappings" : {
         "resume" : {
             "properties" : {
                 "date" : {"type" : "date", "format" : "yyyy-MM-dd

HH:mm:ss"},
"attributes" : {
"properties" : {
"location" : {
"properties" : {
"latlon" : {"type" :
"geo_point"}
}
}
}
},
"data" : {
"type" : "object",
"enabled" : false
}
}
}
}
}

 it doesnt seem to matter, the data node and it's contents are all

showing up in the index.

 - So, is there a way setup a mapping to ignore entire objects while

indexing couchdb?
- I also read about an option "index.mapping.ignore_malformed". How
is this option used and would it force my index to ignore the conflicts
and index the rest of the document? How do I set that in the mapping
definition?

 Thanks,
 Beau

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Sorry, I should have qualified that:
The fsriver works fine - the problem was with uploading documents on a
change that I made to my version of the fsriver.
Sorry for misleading you.

On Tuesday, December 4, 2012 9:36:30 AM UTC, David Pilato wrote:

Hey Amy,

What is your issue with the FileSystem river?
Feel free to open an issue in https://github.com/dadoonet/fsriver/issueswith your use case. I will check it.

David.

Le 4 décembre 2012 à 10:18, Amy <amybl...@gmail.com <javascript:>> a
écrit :

There seems to be a bug in elasticsearch where uploading a file with the
attribute "properties" to elasticsearch using a river will result in that
exception. For example, I got the same exception when trying to ensure that
the file server river https://github.com/dadoonet/fsriveruploaded the
following content:
{"properties":"hi"}
I haven't figured out why this is yet.

On Wednesday, November 7, 2012 8:23:27 PM UTC, Beau Keogh wrote:

"I understand this error occurs when a document is trying to be indexed
that has a different type than specified in the mapping"

meant to say I understand this error occurs when a document is trying to
be indexed that has objects of a different type than specified in the
mapping

On Wednesday, November 7, 2012 2:20:32 PM UTC-6, Beau Keogh wrote:

I'm currently indexing a couchdb database using the river. I put a bunch
of documents in to begin with and now that my app is running more documents
are being added. Problem is they have different structures and I'm getting
lots of this error:

[0]: index [archive], type [resume], id
[fdf6ad3dcf0b98d88b7161952f327c209ab045d5], message
[MapperParsingException[object mapping for [resume] tried to parse as
object, but got EOF, has a concrete value been provided to it?]]

I understand this error occurs when a document is trying to be indexed
that has a different type than specified in the mapping. I suspect it's a
object-string conflict. However, this is an issue since we have a "data"
node in the documents that can be anything (depending on the source of the
document) and I need the flexibility to index different types of children
under this "data" node.

So in trying to find a solution, I'm trying to see how I can make ES
ignore the entire data node while it's indexing from couchdb. But when I
try to set the mapping like so:

{
"settings" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
},
"mappings" : {
"resume" : {
"properties" : {
"date" : {"type" : "date", "format" : "yyyy-MM-dd
HH:mm:ss"},
"attributes" : {
"properties" : {
"location" : {
"properties" : {
"latlon" : {"type" : "geo_point"}
}
}
}
},
"data" : {
"type" : "object",
"enabled" : false
}
}
}
}
}

it doesnt seem to matter, the data node and it's contents are all showing
up in the index.

  • So, is there a way setup a mapping to ignore entire objects while
    indexing couchdb?
  • I also read about an option "index.mapping.ignore_malformed". How is
    this option used and would it force my index to ignore the conflicts and
    index the rest of the document? How do I set that in the mapping definition?

Thanks,
Beau

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Sorry. I don't understand your concern.

You are using the couchDb river, aren't you?
If so, why are you talking about FSRiver?

I can't see what is your architecture here.

Could you elaborate a bit more what you have (couchDb instance, ES instance, ES
plugins) and what you are trying to do (inject Doc in couchDb, then add
attachment and search for it in ES for example)?

David.

Le 5 décembre 2012 à 11:59, Amy amyblarney@gmail.com a écrit :

Sorry, I should have qualified that:
The fsriver works fine - the problem was with uploading documents on a change
that I made to my version of the fsriver.
Sorry for misleading you.

On Tuesday, December 4, 2012 9:36:30 AM UTC, David Pilato wrote:

Hey Amy,

What is your issue with the FileSystem river?
Feel free to open an issue in Issues · dadoonet/fscrawler · GitHub
with your use case. I will check it.
https://github.com/dadoonet/fsriver/issues

David.

Le 4 décembre 2012 à 10:18, Amy <
https://github.com/dadoonet/fsriver/issues amybl...@gmail.com> a écrit :

> > > There seems to be a bug in elasticsearch where uploading a file
> > > with the attribute "properties" to elasticsearch using a river
> > > will result in that exception. For example, I got the same
> > > exception when trying to ensure that the file server river
> > > <https://github.com/dadoonet/fsriver> uploaded the following
> > > content:
{"properties":"hi"}
I haven't figured out why this is yet.

On Wednesday, November 7, 2012 8:23:27 PM UTC, Beau Keogh wrote:
  > > > > "I understand this error occurs when a document is trying to
  > > > > be indexed that has a different type than specified in the
  > > > > mapping"
  meant to say I understand this error occurs when a document is

trying to be indexed that has objects of a different type than specified
in the mapping

  On Wednesday, November 7, 2012 2:20:32 PM UTC-6, Beau Keogh wrote:
    > > > > > I'm currently indexing a couchdb database using the
    > > > > > river. I put a bunch of documents in to begin with and
    > > > > > now that my app is running more documents are being
    > > > > > added. Problem is they have different structures and
    > > > > > I'm getting lots of this error:
    [0]: index [archive], type [resume], id

[fdf6ad3dcf0b98d88b7161952f327c209ab045d5], message
[MapperParsingException[object mapping for [resume] tried to parse as
object, but got EOF, has a concrete value been provided to it?]]

    I understand this error occurs when a document is trying to be

indexed that has a different type than specified in the mapping. I
suspect it's a object-string conflict. However, this is an issue since
we have a "data" node in the documents that can be anything (depending
on the source of the document) and I need the flexibility to index
different types of children under this "data" node.

    So in trying to find a solution, I'm trying to see how I can

make ES ignore the entire data node while it's indexing from couchdb.
But when I try to set the mapping like so:

    {
        "settings" : {
            "number_of_shards" : 5,
            "number_of_replicas" : 1
        },
        "mappings" : {
            "resume" : {
                "properties" : {
                    "date" : {"type" : "date", "format" :

"yyyy-MM-dd HH:mm:ss"},
"attributes" : {
"properties" : {
"location" : {
"properties" : {
"latlon" : {"type" :
"geo_point"}
}
}
}
},
"data" : {
"type" : "object",
"enabled" : false
}
}
}
}
}

    it doesnt seem to matter, the data node and it's contents are

all showing up in the index.

    - So, is there a way setup a mapping to ignore entire objects

while indexing couchdb?
- I also read about an option
"index.mapping.ignore_malformed". How is this option used and would it
force my index to ignore the conflicts and index the rest of the
document? How do I set that in the mapping definition?

    Thanks,
    Beau







  > > > > 
  --


> > > 

--
David Pilato
http://www.scrutmydocs.org/ http://www.scrutmydocs.org/
http://dev.david.pilato.fr/ http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--