Document size and score during search


(Hiro Gangwani) #1

Dear Sir,

We are indexing the document content and executing text based search from
document content. Is there way to disable lucene's preference to score
shorter documents higher?

Consider following example

  1. we have 2 documents A and B
  2. We execute search based upon text Java
  3. A document has 10 matching words while document B has 7 words. But size
    of document B is smaller than document A.
  4. During search operation document B is assigned higher score because its
    size is smaller than document A( Though A has higher count of words Java)

In one of forums we learnt that using omit_norms we can ignore the size of
document while indexing. We tried this approach and still not getting the
desired results.

we are using Java API to create the indexes using XContentBuilder. please
find the code below

-- Creating index
XContentBuilder map = XContentFactory.jsonBuilder().startObject()
.startObject(idxType)
.startObject("properties")
.startObject("file")
.field("type", "attachment")
.field("omit_norm","true")
.startObject("fields")
.startObject("refid")
.field("store", "yes")
.endObject()
.startObject("name")
.field("store", "yes")
.endObject()
.startObject("itexp")
.field("store", "yes")
.endObject()
.startObject("totalexp")
.field("store", "yes")
.endObject()
.endObject()
.endObject()
.endObject()
.endObject();
CreateIndexResponse lResponse =
client.admin().indices().prepareCreate(idxName)
.addMapping("attachment", map).execute().actionGet();

--- Indexing document

XContentBuilder source = XContentFactory.jsonBuilder().startObject()
.field("file", data64)
.field("refid", "2")
.field("name", "Maya")     
.field("totalexp",11.0);

Please let me know if above code is correct.We are not getting desired
results even after applying the omit_norms parameter.

Thanks in advance

Hiro Gangwani

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b707663b-7b7b-48cd-ab96-4490a0171ec1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

iirc you cannot set omit_norms in the attachment type itself (as that type
consist of several sub fields). You need to set it for the "file", like
this (untested, on the top of my head)

XContentBuilder map = XContentFactory.jsonBuilder().startObject()
.startObject(idxType)
.startObject("properties")
.startObject("file")
.field("type", "attachment")
.startObject("fields")
.startObject("file")
.field("omit_norms",true)
.endObject()

Also, note that it omit_norms and not omit_norm. In addition I dont know,
what the refid, itexp and totalexp fields are used for inside of the
attachment, see the documentation for supported fields


Maybe it is something internal I am not aware of.

--Alex

On Mon, Jan 27, 2014 at 12:02 PM, Hiro Gangwani hiro.gangwani@gmail.comwrote:

Dear Sir,

We are indexing the document content and executing text based search from
document content. Is there way to disable lucene's preference to score
shorter documents higher?

Consider following example

  1. we have 2 documents A and B
  2. We execute search based upon text Java
  3. A document has 10 matching words while document B has 7 words. But size
    of document B is smaller than document A.
  4. During search operation document B is assigned higher score because its
    size is smaller than document A( Though A has higher count of words Java)

In one of forums we learnt that using omit_norms we can ignore the size of
document while indexing. We tried this approach and still not getting the
desired results.

we are using Java API to create the indexes using XContentBuilder. please
find the code below

-- Creating index
XContentBuilder map = XContentFactory.jsonBuilder().startObject()
.startObject(idxType)
.startObject("properties")
.startObject("file")
.field("type", "attachment")
.field("omit_norm","true")
.startObject("fields")
.startObject("refid")
.field("store", "yes")
.endObject()
.startObject("name")
.field("store", "yes")
.endObject()
.startObject("itexp")
.field("store", "yes")
.endObject()
.startObject("totalexp")
.field("store", "yes")
.endObject()
.endObject()
.endObject()
.endObject()
.endObject();
CreateIndexResponse lResponse =
client.admin().indices().prepareCreate(idxName)
.addMapping("attachment", map).execute().actionGet();

--- Indexing document

XContentBuilder source = XContentFactory.jsonBuilder().startObject()
.field("file", data64)
.field("refid", "2")
.field("name", "Maya")
.field("totalexp",11.0);

Please let me know if above code is correct.We are not getting desired
results even after applying the omit_norms parameter.

Thanks in advance

Hiro Gangwani

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b707663b-7b7b-48cd-ab96-4490a0171ec1%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9O96SW1Xxac-dhYaZcD9dW%3DD8GCT2rSqVFANkr236%3DYg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3