ElasticSearch + CouchDB + BIG_INTEGER = Oh My

Hello,
I seem to be running into some trouble with Elasticsearch and the CouchDB
River plugin.

When elasticsearch indexes my database _changes, it runs into this problem:

Exception in thread "elasticsearch[Trapper][couchdb_river_indexer][T#1]"

org.elasticsearch.ElasticSearchIllegalStateException: No matching token for
number_type [BIG_INTEGER]
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.convertNumberType(JsonXContentParser.java:206)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.numberType(JsonXContentParser.java:65)
at
org.elasticsearch.common.xcontent.support.XContentMapConverter.readValue(XContentMapConverter.java:97)
at
org.elasticsearch.common.xcontent.support.XContentMapConverter.readMap(XContentMapConverter.java:77)
at
org.elasticsearch.common.xcontent.support.XContentMapConverter.readValue(XContentMapConverter.java:110)
at
org.elasticsearch.common.xcontent.support.XContentMapConverter.readMap(XContentMapConverter.java:77)
at
org.elasticsearch.common.xcontent.support.XContentMapConverter.readMap(XContentMapConverter.java:56)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.map(AbstractXContentParser.java:121)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.mapAndClose(AbstractXContentParser.java:132)
at
org.elasticsearch.river.couchdb.CouchdbRiver.processLine(CouchdbRiver.java:218)
at
org.elasticsearch.river.couchdb.CouchdbRiver.access$500(CouchdbRiver.java:64)
at
org.elasticsearch.river.couchdb.CouchdbRiver$Indexer.run(CouchdbRiver.java:334)
at java.lang.Thread.run(Thread.java:679)

This is because I have a number of documents in my database with field
values that look like 4634970607942323000.

I have googled around, and tried filtering, but filtering seems to only
lead to filtering of document types (or documents with certain fields and
values). I don't seem to be able to filter out fields from being indexed.

These are my solutions so far, with notes in parens:

  • rewrite all field values in couchdb with BIG_INTEGER to smaller values
    (problem: I tried doing that, and still ran into the same problem, albeit
    now I have more documents indexed. I suspect the issue is with unknown
    fields with large integers. I have no idea where ElasticSearch stopped
    indexing and even logging level set to TRACE yielded no clues)
  • cast all field values from _changes to strings (only a few fields
    matter in my documents, and those that have integers in them are meta data
    usually. I tried with custom mapping and got terribly muddled)
  • make couchdb-river ignore certain fields when indexing documents (I
    have no idea how to do that. I read the filtering documentation, and
    managed to filter only documents with certain fields and certain values for
    the fields, but that's not what I want)
  • make couchdb-river ignore fields when error occurs. (I have no idea
    how to do this either)

So, ElasticSearch community, help me, you're my only hope :stuck_out_tongue:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

May be script filters could help here:

{ "type" : "couchdb", "couchdb" : { "script" : "ctx.doc.field1 = ctx.doc.field1>1000000? 0: ctx.doc.field1" } }
Or something like that...
HTH

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 mai 2013 à 04:09, Chewxy chewxy@gmail.com a écrit :

Hello,
I seem to be running into some trouble with Elasticsearch and the CouchDB River plugin.

When elasticsearch indexes my database _changes, it runs into this problem:

Exception in thread "elasticsearch[Trapper][couchdb_river_indexer][T#1]" org.elasticsearch.ElasticSearchIllegalStateException: No matching token for number_type [BIG_INTEGER]
at org.elasticsearch.common.xcontent.json.JsonXContentParser.convertNumberType(JsonXContentParser.java:206)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.numberType(JsonXContentParser.java:65)
at org.elasticsearch.common.xcontent.support.XContentMapConverter.readValue(XContentMapConverter.java:97)
at org.elasticsearch.common.xcontent.support.XContentMapConverter.readMap(XContentMapConverter.java:77)
at org.elasticsearch.common.xcontent.support.XContentMapConverter.readValue(XContentMapConverter.java:110)
at org.elasticsearch.common.xcontent.support.XContentMapConverter.readMap(XContentMapConverter.java:77)
at org.elasticsearch.common.xcontent.support.XContentMapConverter.readMap(XContentMapConverter.java:56)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.map(AbstractXContentParser.java:121)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.mapAndClose(AbstractXContentParser.java:132)
at org.elasticsearch.river.couchdb.CouchdbRiver.processLine(CouchdbRiver.java:218)
at org.elasticsearch.river.couchdb.CouchdbRiver.access$500(CouchdbRiver.java:64)
at org.elasticsearch.river.couchdb.CouchdbRiver$Indexer.run(CouchdbRiver.java:334)
at java.lang.Thread.run(Thread.java:679)

This is because I have a number of documents in my database with field values that look like 4634970607942323000.

I have googled around, and tried filtering, but filtering seems to only lead to filtering of document types (or documents with certain fields and values). I don't seem to be able to filter out fields from being indexed.

These are my solutions so far, with notes in parens:

rewrite all field values in couchdb with BIG_INTEGER to smaller values (problem: I tried doing that, and still ran into the same problem, albeit now I have more documents indexed. I suspect the issue is with unknown fields with large integers. I have no idea where ElasticSearch stopped indexing and even logging level set to TRACE yielded no clues)
cast all field values from _changes to strings (only a few fields matter in my documents, and those that have integers in them are meta data usually. I tried with custom mapping and got terribly muddled)
make couchdb-river ignore certain fields when indexing documents (I have no idea how to do that. I read the filtering documentation, and managed to filter only documents with certain fields and certain values for the fields, but that's not what I want)
make couchdb-river ignore fields when error occurs. (I have no idea how to do this either)
So, ElasticSearch community, help me, you're my only hope :stuck_out_tongue:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sorry for replying late, and being a little obtuse about this:

I ran it with a script. The script does this in order:

  • ctx.ignore = ctx.doc.type = \u0027User\0027. Basically this forces
    elasticsearch to ignore the doc_type called 'User', since I only want
    elasticsearch to index documents with doc_type = 'stream'.
  • I then changed all the public fields of documents with doc_type stream
    to 0, just to test and see what gets indexed by elasticsearch.

I still get the same BIG_INTEGER error. I noticed that the _id field for
documents with doc_type = 'stream' looks something like
this: fe17f8e4dec24572bf99d7310d7338ed. Could this has somehow be
interpreted as a base16 number and cast into an integer by elasticsearch?

Also, is it possible to get elasticsearch to show where which document ID
caused the error? This has been a very puzzling thing and I would
appreciate if anyone could help.

Thank you.

Xuanyi Chew
+61403928398

On Sun, May 5, 2013 at 6:06 PM, David Pilato david@pilato.fr wrote:

May be script filters could help here:
https://github.com/elasticsearch/elasticsearch-river-couchdb#script-filters

{
"type" : "couchdb",
"couchdb" : { "script" : "ctx.doc.field1 = ctx.doc.field1>1000000? 0: ctx.doc.field1"
}
}

Or something like that...
HTH

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 mai 2013 à 04:09, Chewxy chewxy@gmail.com a écrit :

Hello,
I seem to be running into some trouble with Elasticsearch and the CouchDB
River plugin.

When elasticsearch indexes my database _changes, it runs into this problem:

Exception in thread "elasticsearch[Trapper][couchdb_river_indexer][T#1]"

org.elasticsearch.ElasticSearchIllegalStateException: No matching token for
number_type [BIG_INTEGER]
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.convertNumberType(JsonXContentParser.java:206)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.numberType(JsonXContentParser.java:65)
at
org.elasticsearch.common.xcontent.support.XContentMapConverter.readValue(XContentMapConverter.java:97)
at
org.elasticsearch.common.xcontent.support.XContentMapConverter.readMap(XContentMapConverter.java:77)
at
org.elasticsearch.common.xcontent.support.XContentMapConverter.readValue(XContentMapConverter.java:110)
at
org.elasticsearch.common.xcontent.support.XContentMapConverter.readMap(XContentMapConverter.java:77)
at
org.elasticsearch.common.xcontent.support.XContentMapConverter.readMap(XContentMapConverter.java:56)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.map(AbstractXContentParser.java:121)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.mapAndClose(AbstractXContentParser.java:132)
at
org.elasticsearch.river.couchdb.CouchdbRiver.processLine(CouchdbRiver.java:218)
at
org.elasticsearch.river.couchdb.CouchdbRiver.access$500(CouchdbRiver.java:64)
at
org.elasticsearch.river.couchdb.CouchdbRiver$Indexer.run(CouchdbRiver.java:334)
at java.lang.Thread.run(Thread.java:679)

This is because I have a number of documents in my database with field
values that look like 4634970607942323000.

I have googled around, and tried filtering, but filtering seems to only
lead to filtering of document types (or documents with certain fields and
values). I don't seem to be able to filter out fields from being indexed.

These are my solutions so far, with notes in parens:

  • rewrite all field values in couchdb with BIG_INTEGER to smaller
    values (problem: I tried doing that, and still ran into the same problem,
    albeit now I have more documents indexed. I suspect the issue is with
    unknown fields with large integers. I have no idea where ElasticSearch
    stopped indexing and even logging level set to TRACE yielded no clues)
  • cast all field values from _changes to strings (only a few fields
    matter in my documents, and those that have integers in them are meta data
    usually. I tried with custom mapping and got terribly muddled)
  • make couchdb-river ignore certain fields when indexing documents (I
    have no idea how to do that. I read the filtering documentation, and
    managed to filter only documents with certain fields and certain values for
    the fields, but that's not what I want)
  • make couchdb-river ignore fields when error occurs. (I have no idea
    how to do this either)

So, ElasticSearch community, help me, you're my only hope :stuck_out_tongue:

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/vxhg1sw8ikE/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.