Attachment field type parsed properly but cant see metadata in iformation


(David Marko) #1

I'm uploading attachments to be parsed in ES using Java api. I have ES
1.2.2 with proper elasticsearch-mapper-attachments/ plugin installed. Code
works fine and I can search by attachment content but ...

  1. File content is stored into elastic search. Is there a way how to avoid
    this? Just to index the content but not store?

I have this mapping code (not full code):

XContentBuilder map = jsonBuilder().startObject()
.startObject(idxType)
.startObject("properties")
.startObject("file")
.field("type", "attachment")
.field("store","no")
.endObject()
.endObject()
.endObject();

and indexing by using this:

BytesReference json = jsonBuilder()
.startObject()
.field("_id", filePath)
.field("file", data64)
.endObject().bytes();

IndexResponse idxResp = client.prepareIndex().setIndex(idxName).setType(
idxType).setId(filePath)

  1. I cant see file metadata created as described in docs. I understand
    that they are (should be) created automaticly ?

Docs says these fields should appear ...

"fields" : {
"file" : {"index" : "no"},
"title" : {"store" : "yes"},
"date" : {"store" : "yes"},
"author" : {"analyzer" : "myAnalyzer"},
"keywords" : {"store" : "yes"},
"content_type" : {"store" : "yes"},
"content_length" : {"store" : "yes"},
"language" : {"store" : "yes"}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5a29d66f-99d8-48e4-b93c-7caf61b93214%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #2

As you are a Java dev, I'd recommend using directly Tika in your code and extract data as you need and produce JSON which exactly answers to your needs.
Somehow, this: https://github.com/dadoonet/fsriver/blob/master/src/main/java/fr/pilato/elasticsearch/river/fs/river/FsRiver.java#L688-L695

That way, you won't need to send a full binary doc to elasticsearch just to index some meta data or raw text.

That said, you could look at Source exclude: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html#include-exclude

The mapper attachment never modify source document.
But, if you ask for stored field at search time in addition to default "_source" field, you should get back your values.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html#search-request-fields

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 11 juillet 2014 à 15:53:54, David Marko (dmarko484@gmail.com) a écrit:

I'm uploading attachments to be parsed in ES using Java api. I have ES 1.2.2 with proper elasticsearch-mapper-attachments/ plugin installed. Code works fine and I can search by attachment content but ...

  1. File content is stored into elastic search. Is there a way how to avoid this? Just to index the content but not store?

I have this mapping code (not full code):

XContentBuilder map = jsonBuilder().startObject()
.startObject(idxType)
.startObject("properties")
.startObject("file")
.field("type", "attachment")
.field("store","no")
.endObject()
.endObject()
.endObject();

and indexing by using this:

BytesReference json = jsonBuilder()
.startObject()
.field("_id", filePath)
.field("file", data64)
.endObject().bytes();

IndexResponse idxResp = client.prepareIndex().setIndex(idxName).setType(idxType).setId(filePath)

  1. I cant see file metadata created as described in docs. I understand that they are (should be) created automaticly ?

Docs says these fields should appear ...

"fields" : {
"file" : {"index" : "no"},
"title" : {"store" : "yes"},
"date" : {"store" : "yes"},
"author" : {"analyzer" : "myAnalyzer"},
"keywords" : {"store" : "yes"},
"content_type" : {"store" : "yes"},
"content_length" : {"store" : "yes"},
"language" : {"store" : "yes"}
}

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5a29d66f-99d8-48e4-b93c-7caf61b93214%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53c01069.2d1d5ae9.70e%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(system) #3