Elastic does not index some of the data

I have set up indexing of 3 types of data: items, single pages and files. All of them are represented as objects with properties. This is some examples of the properties:
image

For some reason, if I include file content property (a very long string) into the index, neither single pages nor files get indexed. If this property is skipped everything works as expected. Indexing is done by making request to elasticsearch from Omeka website (PHP).
Any advice on this would be helpful.

1 Like

Welcome!

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

This is the request for items:

POST _bulk
{"index":{"_index":"orly-test","_type":"doc","_id":"item_39"}}
{"resulttype":"Item","model":"Item","modelid":39,"featured":false,"public":true,"created":"2018-06-22T16:36:13+00:00","updated":"2020-04-07T16:04:44+00:00","title":"1\u0414\u043e\u043a\u043b\u0430\u0434\u043d\u044b\u0435 \u0437\u0430\u043f\u0438\u0441\u043a\u0438 \u043e\u0431 \u0430\u0433\u0435\u043d\u0442\u0443\u0440\u043d\u043e-\u043e\u043f\u0435\u0440\u0430\u0442\u0438\u0432\u043d\u043e\u0439 \u0440\u0430\u0431\u043e\u0442\u0435 \u043e\u0440\u0433\u0430\u043d\u043e\u0432 \u041d\u041a\u0412\u0414: \u043e\u043f\u0435\u0440\u0430\u0442\u0438\u0432\u043d\u044b\u0435 \u0434\u0430\u043d\u043d\u044b\u0435 \u043e \u0442\u0440\u043e\u0446\u043a\u0438\u0441\u0442\u0430\u0445 \u0438 \u043f\u0440\u0430\u0432\u044b\u0445 \u0441\u0438\u043e\u043d\u0438\u0441\u0442\u0430\u0445","collection":"1\u0421\u0411\u0423, \u041a\u0438\u0435\u0432, \u0424. 16: \u0421\u0435\u043a\u0440\u0435\u0442\u0430\u0440\u0438\u0430\u0442 \u0413\u041f\u0423-\u041a\u0413\u0411 \u0423\u0421\u0421\u0420","itemtype":"Text","element":{"identifier":"sbu_kv_16_34_30-1935","source":"1\u041e\u0413\u0410 \u0421\u0411\u0423, \u041a\u0438\u0435\u0432, \u0444.16, \u0434.34  ","ispartof":"1\u041e\u0442\u0440\u0430\u0441\u043b\u0435\u0432\u043e\u0439 \u0433\u043e\u0441\u0443\u0434\u0430\u0440\u0441\u0442\u0432\u0435\u043d\u043d\u044b\u0439 \u0430\u0440\u0445\u0438\u0432 \u0421\u043b\u0443\u0436\u0431\u044b \u0431\u0435\u0437\u043e\u043f\u0430\u0441\u043d\u043e\u0441\u0442\u0438 \u0423\u043a\u0440\u0430\u0438\u043d\u044b, \u041a\u0438\u0435\u0432, \u0424.16: \u0441\u0435\u043a\u0440\u0435\u0442\u0430\u0440\u0438\u0430\u0442 \u0413\u041f\u0423-\u041a\u0413\u0411 \u0423\u0421\u0421\u0420 ","title":"1\u0414\u043e\u043a\u043b\u0430\u0434\u043d\u044b\u0435 \u0437\u0430\u043f\u0438\u0441\u043a\u0438 \u043e\u0431 \u0430\u0433\u0435\u043d\u0442\u0443\u0440\u043d\u043e-\u043e\u043f\u0435\u0440\u0430\u0442\u0438\u0432\u043d\u043e\u0439 \u0440\u0430\u0431\u043e\u0442\u0435 \u043e\u0440\u0433\u0430\u043d\u043e\u0432 \u041d\u041a\u0412\u0414: \u043e\u043f\u0435\u0440\u0430\u0442\u0438\u0432\u043d\u044b\u0435 \u0434\u0430\u043d\u043d\u044b\u0435 \u043e \u0442\u0440\u043e\u0446\u043a\u0438\u0441\u0442\u0430\u0445 \u0438 \u043f\u0440\u0430\u0432\u044b\u0445 \u0441\u0438\u043e\u043d\u0438\u0441\u0442\u0430\u0445","temporalcoverage":"1935 ","extent":"1491 \u043b.  (\u043f\u043e\u0434\u0431\u043e\u0440\u043a\u0430 \u201278 \u0441.)","contributor":"Benyamin Lukin"},"elements":[{"displayName":"Identifier","name":"identifier"},{"displayName":"Source","name":"source"},{"displayName":"Is Part Of","name":"ispartof"},{"displayName":"Title","name":"title"},{"displayName":"Temporal Coverage","name":"temporalcoverage"},{"displayName":"Extent","name":"extent"},{"displayName":"Contributor","name":"contributor"}],"tags":["\u0417\u0430\u043f\u043e\u0440\u043e\u0436\u044c\u0435 (\u0410\u043b\u0435\u043a\u0441\u0430\u043d\u0434\u0440\u043e\u0432\u0441\u043a)","\u041a\u0438\u0435\u0432","\u041a\u043e\u043d\u0441\u0442\u0430\u043d\u0442\u0438\u043d\u043e\u0432\u043a\u0430","\u041e\u0434\u0435\u0441\u0441\u0430","\u0420\u043e\u0432\u043d\u043e (R\u00f3wne)","\u0425\u0430\u0440\u044c\u043a\u043e\u0432","\u0427\u0435\u0440\u043d\u0438\u0433\u043e\u0432"]}

The request for files:

POST _bulk
{"index":{"_index":"orly-test","_type":"doc","_id":"file_40"}}
{"resulttype":"File","model":"File","modelid":40,"featured":false,"public":false,"created":"2018-06-22T16:36:20+00:00","updated":"2020-04-07T16:04:44+00:00","title":null,"item_id":39,"filename":"54822b6858b46c3a802a3b135ef67fc2.pdf","original_filename":"original_filename","size":17582964,"mime_type":"application\/pdf","element":{"text":"1\u0413\u0410\u041b\u0423\u0417\u0415\u0412\u0418\u0419 \u0414\u0415\u0420\u0416\u0410\u0412\u041d\u0418\u0419 \u0410\u0420\u04251\u0412\n\u0421\u041b\u0423\u0416\u0411\u0418 \u0411\u0415\u0417\u041f\u0415\u041a\u0418 \u0423\u041a\u0420\u0410\u0428\u0418\n\n\u042d \u0415!\u0425(!\u0432 \u0418\u0417 : 5 .\u00ab\u0441\u0422\u042f\u043a\n\n\u0421\u0415\u041a\u0420\u0415\u0422\u0410\u04201\u0410\u0422 \u0414\u041f\u0423 \u0423\u0421\u0420\u0420 - \u041a\u0414\u0411 \u0423\u0420\u0421\u0420\n\n\u0428\u0424\u041e\u0420\u041c\u0410\u0429\u0419\u041d1\u041f\u041e\u04121\u0414\u041e\u041c\u041b\u0415\u041d\u041d\u042f \u041f\u0420\u041e\n\u041e\u041f\u0415\u0420\u0410\u0422\u0418\u0412\u041d\u041e-\u0421\u041b1\u0414\u0427\u0423 \u0420\u041e\u0411\u041e\u0422\u0423 \u0421\u0422\u0420\u0423\u041a\u0422\u0423\u0420\u041d\u0418\u0425\n\u0428\u0414\u0420\u041e\u0417\u04141\u041b1\u0412 \u0423\u0414\u0411 \u041d\u041a\u0412\u0421 \u0423\u0420\u0421\u0420, \u041f\u0415\u0420\u041b\u042e\u0421\u0422\u0420\u0410\u04261\u042e\n\u041a\u041e\u0420\u0415\u0421\u041f\u041e\u041d\u0414\u0415\u041d\u0426\u041f \u0422\u0410 \u0412\u0418\u042f\u0412\u041b\u0415\u041d\u041d\u042f"},"elements":[{"displayName":"Text","name":"text"}]}

The text for files is very big, so I have truncated it a lot for this example. In the requests the maximum size for bulk is 500.

Which field is what you called the "content"?
What is the mapping?

Could you simplify a bit the example and remove all what is not needed and reproduce the exact problem only with a minimalistic example?

Are you using dynamic mappings? What is the maximum length of the content field?

The file content is element.text.
This is a code snippet where the mapping is defined:

$mappings = [

            'doc' => [

                'properties' => [

                    // Common Mappings

                    'resulttype'  => ['type' => 'keyword'],

                    'title'       => ['type' => 'text'],

                    'description' => ['type' => 'text'],

                    'text'        => ['type' => 'text'],

                    'model'       => ['type' => 'keyword'],

                    'modelid'     => ['type' => 'integer'],

                    'featured'    => ['type' => 'boolean'],

                    'public'      => ['type' => 'boolean'],

                    'created'     => ['type' => 'date'],

                    'updated'     => ['type' => 'date'],

                    'tags'        => ['type' => 'keyword'],

                    'slug'        => ['type' => 'keyword'],

                    'url'         => ['type' => 'keyword'],

                    // Item-Specific

                    'collection' => ['type' => 'text'],

                    'itemtype'   => ['type' => 'keyword'],

                    'elements'   => ['type' => 'keyword', 'index' => false],

                    'element'    => ['type' => 'object'],

                    // Exhibit-Specific

                    'credits' => ['type' => 'text'],

                    'exhibit' => ['type' => 'text'],

                    'blocks' => [

                        'type' => 'nested',

                        'properties' => [

                            'text'        => ['type' => 'text'],

                            'attachments' => ['type' => 'text']

                        ]

                    ],

                    // Neatline-Specific

                    'neatline'        => ['type' => 'text'],

                    'neatlineRecords' => ['type' => 'keyword', 'index' => false],

                    // PdfText-Specific

                    'item_id'         => ['type' => 'integer'],

                    'filename'          => ['type' => 'text'],

                    'original_filename' => ['type' => 'text'],

                    'size'              => ['type' => 'integer'],

                    'mime_type'         => ['type' => 'text'],

                ]

            ]

        ];

I don't know how to simplify the example, to be honest. I am sure that the problem is with element.text field because everything seems to work if I skip it. Locally I have succeeded in indexing everything including this field, but I have much less data than in production.

I have added the mappings in another response. The limit was not set, but I could find the longest value we have if that is what you are asking.

If the field is mapped as text and not keyword the length should not matter.

I have looked up the mappings for the index in Kibana:

{
  "mappings": {
    "doc": {
      "properties": {
        "collection": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "created": {
          "type": "date"
        },
        "element": {
          "properties": {
            "contributor": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "extent": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "identifier": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "ispartof": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "source": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "temporalcoverage": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "text": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "title": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "elements": {
          "properties": {
            "displayName": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "featured": {
          "type": "boolean"
        },
        "filename": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "item_id": {
          "type": "long"
        },
        "itemtype": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "mime_type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "model": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "modelid": {
          "type": "long"
        },
        "original_filename": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "public": {
          "type": "boolean"
        },
        "resulttype": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "size": {
          "type": "long"
        },
        "tags": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "title": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "updated": {
          "type": "date"
        }
      }
    }
  }
}

Here element.text is a text.

Yes, but you have a subfield that is mapped as keyword which has length limitations. I suspect you try removing the keyword subfield through an index template and try again.

I have manually specified the mapping for element but it did not help.
UPDATE: It seems that keyword still exists for the element.text property.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.