Import CSV data to Elastic search but _source missing

Kibana version : 7.13

Elasticsearch version : 7.13

APM Server version : 7.13

APM Agent language and version : java 1.8

Fresh install or upgraded from other version? fresh install

Is there anything special in your setup? For example, are you using the Logstash or Kafka outputs? Are you using a load balancer in front of the APM Servers? Have you changed index pattern, generated custom templates, changed agent configuration etc. :-no

While i am importing a CSV file to elastic search using data visualizer ,I see that the imported data in discovery gets added to the root , Whn examining the json data structure its as below

{
"_index": "1",
"_type": "_doc",
"_id": "Opanx3oB_6DvhH3RZE2J",
"_version": 1,
"_score": 0,
"fields": {
"Brand Name": [
"{"regions": [{"region": 1},{"region": 2}]}"
],
"Brand": [
"Optical"
],
"Description": [
"efghkkdksadfkdfj"
],
"EndTime": [
"2021-01-01T01:00:00.000Z"
],
"App Name": [
"eye"
],
"StartTime": [
"2020-12-31T00:00:00.000Z"
],
"Fiscal Week": [
"FW01"
],
"Summary": [
"abc"
]
}
}

All the data gets added within a field tag. How can I introduce _source in this ? . SO that I can work further creating ingest pipelines. I believe _source is needed for ingest node?

How did you examine the JSON structure? It looks like you've done a search using the fields option.

It looks like you called your index 1, so what do you get if you just do the simplest possible search against that index? Like this:

GET /1/_search

You could do that in Kibana dev console. Or if you use curl then you'll want to add ?pretty to make the output easier to read.

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 25,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "1",
        "_type" : "_doc",
        "_id" : "Opanx3oB_6DvhH3RZE2J",
        "_score" : 1.0,
        "_source" : {
          "Description" : "efghkkdksadfkdfj",
          "EndTime" : "Jan 1 2021 01:00:00",
          "App Name" : "ddd",
          "StartTime" : "Dec 31 2020 00:00:00",
          "Brand" : "Optical",
          "Brand Name" : """{"regions": [{"region": 1},{"region": 2}]}""",
          "Fiscal Week" : "FW01",
          "Summary" : "abc"
        }
      },
      {
        "_index" : "1",
        "_type" : "_doc",
        "_id" : "O5anx3oB_6DvhH3RZE2J",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "1",
        "_type" : "_doc",
        "_id" : "PJanx3oB_6DvhH3RZE2J",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "1",
        "_type" : "_doc",
        "_id" : "PZanx3oB_6DvhH3RZE2J",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "1",
        "_type" : "_doc",
        "_id" : "Ppanx3oB_6DvhH3RZE2J",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "1",
        "_type" : "_doc",
        "_id" : "P5anx3oB_6DvhH3RZE2J",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "1",
        "_type" : "_doc",
        "_id" : "QJanx3oB_6DvhH3RZE2J",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "1",
        "_type" : "_doc",
        "_id" : "QZanx3oB_6DvhH3RZE2J",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "1",
        "_type" : "_doc",
        "_id" : "Qpanx3oB_6DvhH3RZE2J",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "1",
        "_type" : "_doc",
        "_id" : "Q5anx3oB_6DvhH3RZE2J",
        "_score" : 1.0,
        "_source" : { }
      }
    ]
  }
}

Thanks @droberts195 , Could you also assist with how I can let elastic know about how the brand name column should be treated as JSON.

What is the format in excel for that json. I will use a csv and a json processor in ingest pipeline

It looks like one of your CSV fields contains a JSON document. Since we import the file as simple CSV this is just showing up as a string.

File upload does not automatically detect complex formats where a field of one format contains a multi-field data structure in some other format.

However, you can edit ingest pipeline that file upload chooses before you import the file. Go to the advanced tab of the page immediately before you start the import. Then add a JSON processor at the end of the ingest pipeline that file upload has chosen. The instructions and an example are in JSON processor | Elasticsearch Guide [7.13] | Elastic.

Thansk @droberts195 ..I have tried the Json processor but it is not picking the content as Json . Can you put a sample format that I should be putting in the excel for the Json . I am not sure of the quotes and syntax . I have enclosed the whole string in quotes and the keys and values with quotes too

Is this format correct to add in csv?

"{"foo": 2000}" do we need to escape the inner quotes with a backslash ?

In Excel your field should look like this: {"foo":2000}

When Excel saves that as CSV it might add outer quotes and escape the inner quotes like this: "{""foo"":2000}"

(It might not quote the value in the CSV file - it's not necessary if it doesn't contain embedded commas.)

When this is converted into a field in _source you want it to look like "Brand" : "{\"foo\":2000}"

(You don't have to add the backslashes - the JSON printer will add them because the raw field value contains quotes.)

Then the JSON processor should be able to handle it.

1 Like

Thanks @droberts195

I tried using your format in excel

I have the below error in import

0: failed to parse field [BrandName] of type [keyword] in document with id 'T524yXoB_6DvhH3RTXIN'. Preview of field's value: '{foo=2000}'

{"message":"Optical,eyeNET,FW01,Dec 31 2020 00:00:00,Jan 1 2021 01:00:00,abc,efghkkdksadfkdfj,,"{""foo"":2000}""}

What could be wrong ?

Maybe it's a bug.

The internal message being used to send data from Kibana to Elasticsearch in JSON format looks like it is not properly escaped.

{"message":"Optical,eyeNET,FW01,Dec 31 2020 00:00:00,Jan 1 2021 01:00:00,abc,efghkkdksadfkdfj,,"{""foo"":2000}""}

should be:

{"message":"Optical,eyeNET,FW01,Dec 31 2020 00:00:00,Jan 1 2021 01:00:00,abc,efghkkdksadfkdfj,,\"{\"\"foo\"\":2000}\""}
1 Like

Thanks @droberts195 so what could be a probable fix for this issue for me to proceed.. I am sorry but I am just trying to fix this particular scenario