How to manually index mongodb without mongodb-river?

For some reason, I want to index my mongodb manually, totally about 400 thousand docs.

I do this as follow:

  1. use mongoexport to export all docs.
  2. using POST api to index them one by one.

But I have some problems:

  1. Every doc has a "_id" field. If I index such a doc, I couldn't search it !
    I have to rename this filed to other name
  2. The index process often chokes on some parsing errors, such as unecpeted
    character '(', invaild date format, etc. I have to fix it for every error.

Any idea?

Can you post a sample of offending document.

Thanks
Vineeth

On Fri, Mar 29, 2013 at 3:26 PM, lcxlcx nasa4836@gmail.com wrote:

For some reason, I want to index my mongodb manually, totally about 400
thousand docs.

I do this as follow:

  1. use mongoexport to export all docs.
  2. using POST api to index them one by one.

But I have some problems:

  1. Every doc has a "_id" field. If I index such a doc, I couldn't search it
    !
    I have to rename this filed to other name
  2. The index process often chokes on some parsing errors, such as unecpeted
    character '(', invaild date format, etc. I have to fix it for every
    error.

Any idea?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-manually-index-mongodb-without-mongodb-river-tp4032605.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  • For the invalid date format case:
    The error message is like this:
    " MapperParsingException[Failed to parse [birthday]]; nested:
    MapperParsingException[failed to parse date field [], tried both date format
    [dateOptionalTime], and timestamp number]; nested:
    IllegalArgumentException[Invalid format: ""]; ","status":400}"

    And I doubt that it is caued by automatical deduction of mapping. Because
    the 'date' field has either empty string value "" or "yyyy-mm-dd"

  • for the invalid "(" case, I fix it , I wrote a script to escape all these
    offending '(', but I dont konw how many other invalid character I will
    stumble upon :slight_smile:

And my main question maybe : why this dirty problem won't occurs when using
mongo-river plugin, but all burst out when I manually do it ??? So I came
here to ask if I do anything wrong or any elegent method ?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-manually-index-mongodb-without-mongodb-river-tp4032605p4032615.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Fri, Mar 29, 2013 at 6:59 PM, vineeth mohan [via Elasticsearch
Users] ml-node+s115913n4032610h4@n3.nabble.com wrote:

Can you post a sample of offending document.

Thanks
Vineeth

  • For the invalid date format case:
    The error message is like this:
    " MapperParsingException[Failed to parse [birthday]]; nested:
    MapperParsingException[failed to parse date field , tried both date
    format [dateOptionalTime], and timestamp number]; nested:
    IllegalArgumentException[Invalid format: ""]; ","status":400}"

    And I doubt that it is caued by automatical deduction of mapping.
    Because the 'date' field has either empty string value "" or
    "yyyy-mm-dd"

  • for the invalid "(" case, I fix it , I wrote a script to escape all
    these offending '(', but I dont konw how many other invalid character
    I will stumble upon :slight_smile:

And my main question maybe : why this dirty problem won't occurs when
using mongo-river plugin, but all burst out when I manually do it ???
So I came here to ask if I do anything wrong or any elegent way to do
this?

--

Regards,
Zhan Jianyu