Hi,
I'm using elasticsearch-river-mongodb to index data from mongodb and make
it possible to search in elasticsearch. Document stored in mongodb could
contain different types of fields, and one of fields is attachment where
content of web pages of other files should be stored. Mappings I created
looks like this:
{
"document": {
"properties": {
"engine_id": {
"store": "yes",
"type": "string"
},
"fields": {
"type": "nested",
"properties": {
"text_value": {
"type": "string",
"analyzer": "simple"
},
"name_value": {
"type": "string",
"analyzer": "simple"
},
"float_value": {
"type": "double"
},
"key": {
"index": "not_analyzed",
"type": "string",
"index_options": "docs",
"omit_norms": true
},
"file_value": {
"type": "attachment",
"file_value":{
"term_vector":"with_positions_offsets",
"store":"yes"
}
}
}
}
}
}
}
}
Field "file_value" stores content of web page. I tried to store it in
several different ways e.g.:
byte[] encodedContent = org.elasticsearch.common.io.Streams.copyToByteArray(
inputStream);
String encodedContent = org.elasticsearch.common.Base64.encodeFromFile(
"test.html");
However, encoded value seems to be treated as regular string in
elasticsearch and I can search it only if I use encoded value in search
query. If I insert real query, I don't have any results. This used to work
fine when I have direct inserts into the elasticsearch, but with mongodb
river it doesn't work or I'm making some mistake. The only solution I have
at the moment to store the whole web page content (with html including) and
store it or to use pre-processing of web page to extract the content and
store as a string.
This is a sample of document stored in mongodb:
{
"_id" : ObjectId("5293cf6a2318b3b53ca5694d"),
"engine_id" : "engineid1234",
"fields" : [
{
"key" : "title",
"text_value" : "Healthcare in India"
},
{
"key" : "file",
"file_value" :
"em9yYW4gamVyZW1pYyBsb2dpdGVjaCBzZWFyY2ggZWxhc3RpY3NlYXJjaAo="
}
]
}
I hope that some of you guys could give me idea what's wrong here.
Thanks,
Zoran
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.