.raw fields not showing up for measurements field in local Elasticsearch index


(Prerana Teligi Harapanahalli Math) #1

Hi, I have created a local Elasticsearch index and I am using the following mapping:

"docs": {
"properties": {
"dates": {"type": "nested","properties": {"count": {"type": "long"},"name": {"type": "long"}}},
"entities": { "type": "nested","properties": { "count": {"type": "long"},"name": {"type":"string","fields": { "raw": {"type": "string", "index": "not_analyzed","null_value": "NULL"}}}}}, 
"locations": { "type": "nested", "properties": {"admin1Code": {"type": "string", "fields": {"raw":{"type": "string", "index": "not_analyzed","null_value": "NULL"}}},
"admin2Code": { "type": "string", "fields": {"raw": {"type":"string","index":"not_analyzed","null_value": "NULL"}}},
"countryCode": {"type": "string","fields": {"raw": {"type":"string","index":"not_analyzed","null_value": "NULL"}}},
"location": {"type": "geo_point","lat_lon": true,"geohash": true},"name": {"type":"string","fields": {"raw": {"type": "string","index": "not_analyzed","null_value": "NULL"}}},
"rawText": {"type": "string","fields": {"raw": {"type":"string","index":"not_analyzed","null_value": "NULL"}}},
"count": {"type": "long"}}},
"id": {"type": "string"},
"geo": {"type": "geo_point","lat_lon": true,"geohash": true},
"mime-type": {"type":"string","fields": {"raw": {"type": "string","index": "not_analyzed","null_value": "NULL"}}},
"places": {"type": "nested","properties": {"count": {"type": "long"},"name": {"type":"string","fields": {"raw": {"type": "string","index": "not_analyzed","null_value": "NULL"}}}}},
"people": {"type": "nested","properties": {"count": {"type": "long"},"name": {"type":"string","fields": {"raw": {"type": "string","index": "not_analyzed","null_value": "NULL"}}}}},
"organizations": {"type": "nested","properties": {"count": {"type": "long"},"name": {"type": "string","fields": {"raw": {"type": "string","index": "not_analyzed","null_value": "NULL"}}}}},
"money": {"type": "nested","properties": {"count": {"type": "long"},"name": {"type":"string","fields": {"raw": {"type": "string","index": "not_analyzed","null_value": "NULL"}}}}},
"percentages": {"type": "nested","properties": {"count": {"type": "long"},"name": {"type":"string","fields": {"raw": {"type": "string","index": "not_analyzed","null_value": "NULL"}}}}},
"time": {"type": "nested","properties": {"count": {"type": "long"},"name": {"type": "string","fields": {"raw": {"type": "string","index": "not_analyzed","null_value": "NULL"}}}}},
"measurements": {"type": "nested","properties": {"value": {"type": "long"},"unit": {"type": "string","fields": {"raw": {"type": "string","index": "not_analyzed","null_value": "NULL"}}}}}
  }
}

and this is my local indexed data:
data.json

I can see valid measurements field in my data. But when I access it using Elasticsearch aggregation .raw field, empty buckets are returned. When I remove .raw field and access it normally some of the buckets are returned. This doesn't make sense. It should return buckets for valid measurements with .raw field as well. I want the measurements to show up with .raw field as well. Can somebody please help? Thank you!


(David Pilato) #2

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.


(Prerana Teligi Harapanahalli Math) #3

Hi @dadoonet , Thank you for your suggestions. I formatted my mapping file properly. Now it is easier to find measurements field. This is the Curl request I used to get data from my measurements field.

curl -X GET "localhost:9200/insight-generator/docs/_search" -H 'Content-Type: application/json' -d'
{"aggs":{"entities":{"nested":{"path":"measurements"},"aggs":{"entity_name":{"terms":{"field":"measurements.normalizedUnit-name.raw","size":1000}}}}}’

Please let me know if you have a solution to this issue. I am new to Elasticsearch and I don't have in-depth knowledge about mappings.


(David Pilato) #4

As I said:


(Prerana Teligi Harapanahalli Math) #5

Actually the entire application is very big and has lots of components to it. So, I can provide the Js script I am using to access the Local Elasticsearch index Measurements field.

    Document.aggregateByMeasurements = function(filters, type){

  var mType = (type == 'raw') ? "rawUnit" : "normalizedUnit";
  var qty   = (type == 'raw') ? "parsedValue" : "normalizedQuantity";

  return Document.query(filters, {
    "entities": {
      "nested": {
        "path": "measurements"
      },
      "aggs": {
        "entity_name": {
          "terms": {
            "field": "measurements." + mType + "-name.raw",
            "size": 5000
          },
          "aggs" : {
            "entity_stats" : {
                "stats" : {
                  "field" : "measurements." + qty
                }
            }
          }
        }
      }
    }
  });
};

It tries to extract normalizedUnit-name and normalizedQuantity-name fields from Measurements and creates buckets.

This is the measurements data:

"measurements": [{"rawUnit-name": "July", "parsedValue": 11, "rawValue": "11", "rawUnit": {"offsetStart": 49, "offsetEnd": 53, "name": "July"}, "offsetEnd": 48, "rawUnit-offsetEnd": 53, "offsetStart": 46, "rawUnit-offsetStart": 49}, {"rawUnit-name": "p", "parsedValue": 3, "rawValue": "3:00", "rawUnit": {"offsetStart": 64, "offsetEnd": 65, "name": "p"}, "offsetEnd": 63, "rawUnit-offsetEnd": 65, "offsetStart": 59, "rawUnit-offsetStart": 64}, {"rawUnit-name": "p", "parsedValue": 0, "rawValue": "13 July until 5:00", "rawUnit": {"offsetStart": 107, "offsetEnd": 108, "name": "p"}, "offsetEnd": 106, "rawUnit-offsetEnd": 108, "offsetStart": 88, "rawUnit-offsetStart": 107}, {"offsetStart": 408, "parsedValue": 1, "rawValue": "1 303.492", "offsetEnd": 417}, {"rawUnit-name": "°", "parsedValue": 148, "rawValue": "148", "rawUnit": {"offsetStart": 1142, "offsetEnd": 1143, "name": "°"}, "offsetEnd": 1142, "rawUnit-offsetEnd": 1143, "offsetStart": 1139, "rawUnit-offsetStart": 1142}, {"offsetStart": 1154, "parsedValue": 22.764, "rawValue": "22.764", "offsetEnd": 1160}, {"rawUnit-name": "m", "parsedValue": 992.385, "normalizedQuantity": 992.385, "rawValue": "992.385", "rawUnit": {"offsetStart": 1175, "offsetEnd": 1176, "name": "m"}, "normalizedUnit-system": "SI base", "offsetEnd": 1174, "rawUnit-offsetEnd": 1176, "offsetStart": 1167, "normalizedUnit-type": "length", "rawUnit-offsetStart": 1175, "normalizedUnit-name": "m", "normalizedUnit": {"type": "length", "name": "m", "system": "SI base"}, "type": "length"}, {"offsetStart": 1274, "parsedValue": 1, "rawValue": "one", "offsetEnd": 1277}, {"offsetStart": 1318, "parsedValue": 4, "rawValue": "four", "offsetEnd": 1322}, {"offsetStart": 1604, "parsedValue": 10.7265, "rawValue": "10.7265", "offsetEnd": 1611}, {"rawUnit-name": "v", "parsedValue": 804.595, "rawValue": "804.595", "rawUnit": {"offsetStart": 2094, "offsetEnd": 2095, "name": "v"}, "offsetEnd": 2094, "rawUnit-offsetEnd": 2095, "offsetStart": 2087, "rawUnit-offsetStart": 2094}, {"offsetStart": 3008, "parsedValue": 1, "rawValue": "one", "offsetEnd": 3011}, {"offsetStart": 3579, "parsedValue": 20, "rawValue": "20", "offsetEnd": 3581}, {"offsetStart": 3660, "parsedValue": 10, "rawValue": "10", "offsetEnd": 3662}, {"rawUnit-name": "mm", "parsedValue": 0, "normalizedQuantity": 0, "rawValue": "100 by 100", "rawUnit": {"offsetStart": 3752, "offsetEnd": 3754, "name": "mm"}, "normalizedUnit-system": "SI base", "offsetEnd": 3751, "rawUnit-offsetEnd": 3754, "offsetStart": 3741, "normalizedUnit-type": "length", "rawUnit-offsetStart": 3752, "normalizedUnit-name": "m", "normalizedUnit": {"type": "length", "name": "m", "system": "SI base"}, "type": "length"}, {"offsetStart": 3793, "parsedValue": 9, "rawValue": "nine", "offsetEnd": 3797}, {"offsetStart": 3916, "parsedValue": 45, "rawValue": "45/315", "offsetEnd": 3922}, {"rawUnit-name": "minutes", "parsedValue": 16, "normalizedQuantity": 960, "rawValue": "16", "rawUnit": {"offsetStart": 3944, "offsetEnd": 3951, "name": "minutes"}, "normalizedUnit-system": "SI base", "offsetEnd": 3943, "rawUnit-offsetEnd": 3951, "offsetStart": 3941, "normalizedUnit-type": "time", "rawUnit-offsetStart": 3944, "normalizedUnit-name": "s", "normalizedUnit": {"type": "time", "name": "s", "system": "SI base"}, "type": "time"}, {"rawUnit-name": "MB", "parsedValue": 60, "rawValue": "60", "rawUnit": {"offsetStart": 4020, "offsetEnd": 4022, "name": "MB"}, "offsetEnd": 4019, "rawUnit-offsetEnd": 4022, "offsetStart": 4017, "rawUnit-offsetStart": 4020}, {"offsetStart": 4729, "parsedValue": 9, "rawValue": "nine", "offsetEnd": 4733}, {"rawUnit-name": "°", "parsedValue": 90, "rawValue": "90", "rawUnit": {"offsetStart": 5180, "offsetEnd": 5181, "name": "°"}, "offsetEnd": 5180, "rawUnit-offsetEnd": 5181, "offsetStart": 5178, "rawUnit-offsetStart": 5180}, {"rawUnit-name": "°", "parsedValue": 90, "rawValue": "90", "rawUnit": {"offsetStart": 5216, "offsetEnd": 5217, "name": "°"}, "offsetEnd": 5216, "rawUnit-offsetEnd": 5217, "offsetStart": 5214, "rawUnit-offsetStart": 5216}, {"offsetStart": 6184, "parsedValue": 1, "rawValue": "one", "offsetEnd": 6187}, {"offsetStart": 6228, "parsedValue": 4, "rawValue": "four", "offsetEnd": 6232}, {"offsetStart": 6251, "parsedValue": 2, "rawValue": "Two", "offsetEnd": 6254}]

(David Pilato) #6

I can't reproduce or understand the problem.

Just recreate a sample and probably easier example as I said.
Unless someone else understands and has an idea.


(Prerana Teligi Harapanahalli Math) #7

I will send you the screenshots and link to the code. It will take me sometime to do so. Please bear with me.

Thanks,
Prerana


(Prerana Teligi Harapanahalli Math) #8

@dadoonet

This is the link to app: http://polar.usc.edu/html/polar-deep-insights/#/config

This is the link to Github code: https://github.com/USCDataScience/polar-deep-insights/tree/master/insight-visualizer

When I download and save the ontologies-> click on TREC-DD-PDF and use that Elasticsearch search index I can see the measurements as shown below:

The problem:

When I create a Local Elasticsearch index by indexing the same data as TREC-DD-PDF and use that index as shown below:

I am not able to see the measurements:

All the other data is visible.
I figured this has something to do with Elasticsearch .raw field and removed it in the code. Then I can see some measurements for local ES index but not all.

Logically speaking it should work for both with and without .raw field right?
I am not sure why this is happening. Can you please help me with this?


(David Pilato) #9

No. I'm afraid I can't. I don't know how to reproduce your problem.
Again, try to translate that to a simple recreation script like:

DELETE index
PUT index/_doc/1
{
  "foo": "bar"
}
GET index/_search
{
  "query": {
    "match": {
      "foo": "bar"
    }
  }
}

What you can do on your side though is to compare both mappings may be...


(Prerana Teligi Harapanahalli Math) #10

ok. Thank you so much for the response. I think you are right. I wasn't able to visualize this before. I will translate it to a simple script and compare both mappings on my side first and get back to you if I have any doubts.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.