Hi guys,
I'm trying to develop service that will store uploaded files as attachment
(file is one field in document). This part works fine as I can search these
files using like_text as input. However, the second part of this service
should compare the file that is just uploaded with the existing files in
order to find duplicates or very similar files. The problem is that I
always get the same results regardless the input I'm using, and these
results are wrong as exactly the same file has smallest score very often.
It looks that like_text extracted from uploaded file is always the same,
and none of the documents has expected score, which should be I believe 1
in case of identical documents. The scores I get are always less then 0.2.
Could you please check if there is something wrong with my code?
String mapping = copyToStringFromClasspath(
"/org/prosolo/services/indexing/documents-mapping.json");
byte[] txt = org.elasticsearch.common.io.Streams.copyToByteArray(file);
Client client = ElasticSearchFactory.getClient();
client.admin().indices().putMapping(putMappingRequest(indexName).type(indexType).source(mapping)).actionGet();
IndexResponse iResponse =
client.index(indexRequest(indexName).type(indexType)
.source(jsonBuilder()
.startObject()
.field("file", txt)
.field("title",title)
.field("visibility",visibilityType.name().toLowerCase())
.field("ownerId",ownerId)
.field("description",description)
.field("contentType",DocumentType.DOCUMENT.name().toLowerCase())
.field("dateCreated",dateCreated)
.field("url",link)
.field("relatedToType",relatedToType)
.field("relatedToId",relatedToId)
.endObject()))
.actionGet();
client.admin().indices().refresh(refreshRequest()).actionGet();
MoreLikeThisRequestBuilder mltRequestBuilder=new
MoreLikeThisRequestBuilder(client, ESIndexNames.INDEX_DOCUMENTS,
ESIndexTypes.DOCUMENT, iResponse.getId());
mltRequestBuilder.setField("file");
SearchResponse response =
client.moreLikeThis(mltRequestBuilder.request()).actionGet();
SearchHits searchHits= response.getHits();
System.out.println("getTotalHits:"+searchHits.getTotalHits());
Iterator hitsIter=searchHits.iterator();
while(hitsIter.hasNext()){
SearchHit searchHit=hitsIter.next();
System.out.println("FOUND DOCUMENT:"+searchHit.getId()+"
title:"+searchHit.getSource().get("title")+" score:"+searchHit.score());
}
And this is the mapping I was using
{
"document":{
"properties":{
"title":{
"type":"string",
"store":true
},
"description":{
"type":"string",
"store":"yes"
},
"contentType":{
"type":"string",
"store":"yes"
},
"dateCreated":{
"store":"yes",
"type":"date"
},
"url":{
"store":"yes",
"type":"string"
},
"visibility": {
"store":"yes",
"type":"string"
},
"ownerId": {
"type": "long",
"store":"yes"
},
"relatedToType": {
"type": "string",
"store":"yes"
},
"relatedToId": {
"type": "long",
"store":"yes"
},
"file":{
"path": "full",
"type":"attachment",
"fields":{
"author": {
"type": "string"
},
"title": {
"store": true,
"type": "string"
},
"keywords": {
"type": "string"
},
"file": {
"store": true,
"term_vector": "with_positions_offsets",
"type": "string"
},
"name": {
"type": "string"
},
"content_length": {
"type": "integer"
},
"date": {
"format": "dateOptionalTime",
"type": "date"
},
"content_type": {
"type": "string"
}
}
}
}
}
}
Thanks,
Zoran
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/79e29c89-62ea-42f3-be93-3e215a75860a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.