Weird behavior using norms with nested type

Hello,

I am observing inexplicable behavior and looking to reconcile that with my understanding of norms and nested types. I am using ES 5.6 as of now and not sure if this can be observed in more recent versions.

I have 2 mappings as shown below

Mapping 1 uses copy_to to copy data from 1 object into another object which has a text multifield:

"field1": {
    "properties": {
       "subfield1": {
          "type": "keyword",
          "index": false,
          "doc_values": false,
          "store": false,
          "copy_to": ["field2.subfield2"]
	   }
	}
}
"field2": {
	"properties": {
		"subfield2": {
			"type": "keyword",
			"index": false,
			"doc_values": false,
            "store": false,
			"fields": {
				"textmultifield": {
					"type": "text",
					"index_options": "offsets",
                    "norms":  true,
                    "store": false
				}
			}
		}
	}
}

Mapping 2 is similar except the the src object is of nested type:

"nestedfield": {
    "type": "nested",
    "properties": {
       "subfield": {
          "type": "keyword",
          "index": false,
          "doc_values": false,
          "store": false,
          "copy_to": ["objectfield.subfield2"]
	   }
	}
}
"objectfield": {
	"properties": {
		"subfield2": {
			"type": "keyword",
			"index": false,
			"doc_values": false,
            "store": false,
			"fields": {
				"textmultifield": {
					"type": "text",
					"index_options": "offsets",
                    "norms":  true,
                    "store": false
				}
			}
		}
	}
}

"_all" field is disabled completely and "_source" is enabled for both mappings

Since the norms is enabled only on the textmultifield and norms are disabled by default if the field is not indexed, I assumed that the norms would be the same for both mappings given the same dataset but in reality the disk usage of norms in the 2nd case is higher by a magnitude or more. I verified that the norms take a lot of space in the 2nd case by analyzing the disk storage utilized by the underlying index files.

Can anyone help me in understanding the discrepancies between the 2 cases? It seems like there is an underlying detail about nested types that has been documented.

Thanks in advance.

If anyone has experienced this or can explain this phenomenon, it would help me a lot. Thanks again.

I digged further into the code and noticed that norms were being created even when omitNorms flag is set to true in the fieldTypes in DefaultIndexingChain.java file in lucene.

This is the code that essentially is invoked as part of DocsWriter during indexing from Elasticsearch.

I chased this and found that the method getOrAddField tries to create a FieldInfo object in the 1st pass. By default this object has omitNorms to false. The method sets the indexOptions as specified in the fieldType on this newly created object but doesn't do the same for omitNorms. This effectively overrides this flag which creates issues down the line.

Here's the code snippet for the method with the fieldInfos.getOrAdd call

private PerField getOrAddField(String name, IndexableFieldType fieldType, boolean invert) {

      // Make sure we have a PerField allocated
      final int hashPos = name.hashCode() & hashMask;
      PerField fp = fieldHash[hashPos];
      while (fp != null && !fp.fieldInfo.name.equals(name)) {
        fp = fp.next;
      }

      if (fp == null) {
        // First time we are seeing this field in this segment

        FieldInfo fi = fieldInfos.getOrAdd(name);
        // Messy: must set this here because e.g. FreqProxTermsWriterPerField looks at the initial
        // IndexOptions to decide what arrays it must create).  Then, we also must set it in
        // PerField.invert to allow for later downgrading of the index options:
        fi.setIndexOptions(fieldType.indexOptions());


        fp = new PerField(fi, invert);
        ...

This issue is compounded when using nested types, there are internal fields in Elasticsearch like _field_names that are repeated multiple times in the top-level document with omitNorms set to true but I think due to this issue in Lucene 6.6.1, that flag is being overriden and norms are being computed and written to the disk unintentionally.

Can some from the Elastic community confirm this? This is severely impacting our platform.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.