Document size and score during search

Hiro_Gangwani · January 27, 2014, 11:02am

Dear Sir,

We are indexing the document content and executing text based search from
document content. Is there way to disable lucene's preference to score
shorter documents higher?

Consider following example

we have 2 documents A and B
We execute search based upon text Java
A document has 10 matching words while document B has 7 words. But size
of document B is smaller than document A.
During search operation document B is assigned higher score because its
size is smaller than document A( Though A has higher count of words Java)

In one of forums we learnt that using omit_norms we can ignore the size of
document while indexing. We tried this approach and still not getting the
desired results.

we are using Java API to create the indexes using XContentBuilder. please
find the code below

-- Creating index
XContentBuilder map = XContentFactory.jsonBuilder().startObject()
.startObject(idxType)
.startObject("properties")
.startObject("file")
.field("type", "attachment")
.field("omit_norm","true")
.startObject("fields")
.startObject("refid")
.field("store", "yes")
.endObject()
.startObject("name")
.field("store", "yes")
.endObject()
.startObject("itexp")
.field("store", "yes")
.endObject()
.startObject("totalexp")
.field("store", "yes")
.endObject()
.endObject()
.endObject()
.endObject()
.endObject();
CreateIndexResponse lResponse =
client.admin().indices().prepareCreate(idxName)
.addMapping("attachment", map).execute().actionGet();

--- Indexing document

XContentBuilder source = XContentFactory.jsonBuilder().startObject()
.field("file", data64)
.field("refid", "2")
.field("name", "Maya")     
.field("totalexp",11.0);

Please let me know if above code is correct.We are not getting desired
results even after applying the omit_norms parameter.

Thanks in advance

Hiro Gangwani

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b707663b-7b7b-48cd-ab96-4490a0171ec1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · January 28, 2014, 8:59am

Hey,

iirc you cannot set omit_norms in the attachment type itself (as that type
consist of several sub fields). You need to set it for the "file", like
this (untested, on the top of my head)

XContentBuilder map = XContentFactory.jsonBuilder().startObject()
.startObject(idxType)
.startObject("properties")
.startObject("file")
.field("type", "attachment")
.startObject("fields")
.startObject("file")
.field("omit_norms",true)
.endObject()

Also, note that it omit_norms and not omit_norm. In addition I dont know,
what the refid, itexp and totalexp fields are used for inside of the
attachment, see the documentation for supported fields

Maybe it is something internal I am not aware of.

--Alex

On Mon, Jan 27, 2014 at 12:02 PM, Hiro Gangwani hiro.gangwani@gmail.comwrote:

Dear Sir,

We are indexing the document content and executing text based search from
document content. Is there way to disable lucene's preference to score
shorter documents higher?

Consider following example

we have 2 documents A and B

We execute search based upon text Java

A document has 10 matching words while document B has 7 words. But size
of document B is smaller than document A.

During search operation document B is assigned higher score because its
size is smaller than document A( Though A has higher count of words Java)

In one of forums we learnt that using omit_norms we can ignore the size of
document while indexing. We tried this approach and still not getting the
desired results.

we are using Java API to create the indexes using XContentBuilder. please
find the code below

-- Creating index
XContentBuilder map = XContentFactory.jsonBuilder().startObject()
.startObject(idxType)
.startObject("properties")
.startObject("file")
.field("type", "attachment")
.field("omit_norm","true")
.startObject("fields")
.startObject("refid")
.field("store", "yes")
.endObject()
.startObject("name")
.field("store", "yes")
.endObject()
.startObject("itexp")
.field("store", "yes")
.endObject()
.startObject("totalexp")
.field("store", "yes")
.endObject()
.endObject()
.endObject()
.endObject()
.endObject();
CreateIndexResponse lResponse =
client.admin().indices().prepareCreate(idxName)
.addMapping("attachment", map).execute().actionGet();

--- Indexing document
XContentBuilder source = XContentFactory.jsonBuilder().startObject()
.field("file", data64)
.field("refid", "2")
.field("name", "Maya")
.field("totalexp",11.0);
Please let me know if above code is correct.We are not getting desired
results even after applying the omit_norms parameter.

Thanks in advance

Hiro Gangwani

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b707663b-7b7b-48cd-ab96-4490a0171ec1%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9O96SW1Xxac-dhYaZcD9dW%3DD8GCT2rSqVFANkr236%3DYg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Newbie quesiton re: document size & score Elasticsearch	3	334	July 6, 2017
Modifying scoring algorithm during search operations Elasticsearch	4	392	July 6, 2017
Scoring rules : Text based search Elasticsearch	2	356	July 6, 2017
Indexing Performance vs Document Size Elasticsearch	4	1466	July 5, 2017
Scoring variable length documents Elasticsearch	1	260	July 6, 2017

Document size and score during search

Related topics