Hi,
I installed elasticsearch flawlessly and started developing a mail indexing
solution.
Dealing with the main setup everything went flawlessly, I even installed
the plugin for tika document text extraction.
After that I wrote some simple beans to write in the system some emails
after parsing using java mail.
When it comes to index attachments (docs, pdfs, docx, open documents, etc
etc), several mails got indexed correctly, some others no.
I had some problems in putting direct base64 encoded documents from the
email, even because when it comes to encoding, I preferred to decode the
contents and reencode it, just to be sure I wrote everything correctly.
When I create the json file (attached to the email), I succeed even in
creating the decoded document whici is readable and the payload I pass to
elasticsearch is working.
Here are the versions:
elasticsearch versione 0.90.0
elasticsearch-mapper-attachments 1.7.0
See attached json as test document
Here's the mapping I used
curl -XGET 'http://localhost:9200/anagrafiche/email/_mapping?pretty=true'
{
"email" : {
"properties" : {
"addTimestamp" : {
"type" : "string"
},
"answered" : {
"type" : "boolean"
},
"attacheddocument" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"attacheddocument" : {
"type" : "string"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
}
}
},
"cgateId" : {
"type" : "string"
},
"contents" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime",
"include_in_all" : true
},
"filePath" : {
"type" : "string"
},
"from" : {
"properties" : {
"address" : {
"type" : "string"
},
"encodedPersonal" : {
"type" : "string",
"include_in_all" : true
}
}
},
"hasattachments" : {
"type" : "boolean"
},
"numlines" : {
"type" : "long"
},
"recipient" : {
"properties" : {
"address" : {
"type" : "string"
},
"encodedPersonal" : {
"type" : "string",
"include_in_all" : true
}
}
},
"seen" : {
"type" : "boolean"
},
"subject" : {
"type" : "string"
}
}
}
}
Here's the output of the indexing command attempt
[maxper@max ~]$ curl -XPOST 'http://localhost:9200/anagrafiche/email/' -d
@testindex.json
{"error":"MapperParsingException[failed to parse]; nested:
JsonParseException[Failed to decode VALUE_STRING as base64
(MIME-NO-LINEFEEDS): Unexpected padding character ('=') as character #3 of
4-char base64 unit: padding only legal as 3rd or 4th character\n at
[Source: [B@45387c9d; line: 1, column: 32804]]; ","status":400}
[maxper@max ~]$
Just to be clear, I really can index some documents, so the mapping should
be correct.
I hope someone may help me
Thanks, Massimiliano
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.