Get text fields length with script

Hi,

I'm trying to get text fields length to calculate my response size. doc['filedname.subfield'].value only geting numeric fields. After Elasticsearc 5.x version we can use "params._souerce.fieldname.subfield" but this very slowly. For error should not occur while running, I using doc.containsKey('fieldname.subfield'). But doc.containsKey returning true and getting null pointer exception. doc.containsKey not working for nested fields.

My query:

GET /device/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "FIELD": {
      "script": "if (doc.containsKey('detail.name') ) { params._source.detail.name.toString().length(); }"
    }
  }
}

For this problem I'm used try { } catch block. My problem solved. But I want to learn doc.containsKey why not working?

Look that:

GET /device/_search
{
  "from": 990,
  "size": 1,
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "FIELD": {
      "script": "try { if (doc.containsKey('detail.name') ) { params._source.detail.name; } } catch(Exception e){'condition : ' + doc.containsKey('detail.name')}"
    }
  },
  "_source": {
    "includes": [
      "detail"
    ]
  }
}

Response:

{
  "took": 26,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1538917,
    "max_score": 1,
    "hits": [
{
        "_index": "device",
        "_type": "data",
        "_id": "AV9tHLfs53nDf34AS",
        "_score": 1,
        "_source": {
          "detail": null
        },
        "fields": {
          "FIELD": [
            "condition : true"
          ]
        }
      }
    ]
  }
}

My condition returning true but detail field is null.

Can't you compute that value at index time? I mean that would be much much more efficient.

Yes I can compute content lenth, but I need custom fields row data to build a csv file. If the collect data query range is to big, my file storage database has error. And I think compute result length for know result size and return user friendly response. My problem is, user can select custom fileds.

Can't you truncate the content on your side after you do the extraction?

Extraction takes a long time. And I'm using pagination for extract. This is very costly compute. I want to know size length without extraction for disallow the request. My first question, doc.containsKey('field.subfield') why returning true for nested field, in current document 'field' is null?

I don't know. I'm not using painless and scripting as I'm trying to avoid that as much as I can.

Extraction takes a long time.

May be share what you are exactly doing?

And I'm using pagination for extract.

How? Like with from and size parameters.

This is very costly compute.

On which end? Your end or elasticsearch end?

What does a document look like?

I want to build csv file of documents includes custom feilds.

I have 1,500,000 document. And in elasticsearch in one time get query with size=1,500,000 proccess take a long time on elasticsearch. Some time server send timeout response.

We must use pagination with from size parameters and collect all pages data.

Documents including sensor datas. And we want to build cvs file of sensor datas. Sensor data contain nested complex json data. When user select custom field, I'm doing preRequest form service to learn output data. If output data is greater than my filedb limit, service disallow this export proccess.

And in elasticsearch in one time get query with size=1,500,000 proccess take a long time on elasticsearch.

You should never ever do that.

Use the scroll API to extract your data per smaller consistent pages like 10000 may be. Sort by _uid if you want to get the faster response.

1 Like

Thanks @dadoonet,

Also I do query with 10000 size and doing pagination with from. Is scroll Api faster than?

Try it. You will probably notice some differences.
At least it will be stable IMO.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.