JSON data won't upload

Hi,

First time user here so apologies if I'm not doing things correctly. I am trying to upload a JSON file, which consists of an array of JSON elements, the longest being < 3100 characters. For whatever reason, the Data Visualizer gets stuck "Analyzing Data". It's pretty frustrating because there isn't much analysis needed: I just want each field to be named after the JSON key, and contain the data at that key.

My file has 1,000 JSON elements. I tried shortening it to 10 elements and got the following error:

[illegal_argument_exception] Merging lines into messages resulted in an unacceptably long message. Merged message would have [5] lines and [11845] characters (limit [10000]). If you have messages this big please increase the value of [line_merge_size_limit]. Otherwise it probably means the timestamp has been incorrectly detected, so try overriding that.

When I upload the same 10 elements after inserting a "timestamp" key (ISO timestamp) into each element, Kibana again gets stuck Analyzing Data.

When I put a timestamp before each row in the file, making it a text file and not a JSON file, I get the following:

[timeout_exception] Aborting structure analysis during [full message Grok pattern field extraction] as it has taken longer than the timeout of [25s], with { suppressed={ 0={ type="exception" & reason="Explanation so far:\n[Using character encoding [UTF-8], which matched the input with [15%] confidence - first [8kB] of input was pure ASCII]\n[Not NDJSON because there was a parsing exception: [Unexpected character ('-' (code 45)): Expected space separating root-level values at [Source: \"2020-11-19T11:57:17.870658 {\"_scan_result_info\": {\"id\": 10, \"finishTime\": \"14...\"; line: 1, column: 6]]]\n[Not XML because there was a parsing exception: [ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog.]]\n[Not CSV because a line has an unescaped quote that is not at the beginning or end of a field: [2020-11-19T11:57:17.870658 {\"_scan_result_info\...

This is interesting because I didn't know the operation could time out. Other times I try to upload data and the Data Visualizer just says "Analyzing data" forever (overnight at least) it would be helpful to time out.

Here is a sample JSON element from the list:

{
  "_scan_result_info": {
    "id": 10,
    "finishTime": "1468118911",
    "importFinish": "1468118916",
    "importStart": "1468118915",
    "startTime": "1468118585",
    "createdTime": "1468118581",
    "name": "scan_name"
  },
  "_is_scan_result_empty": "0",
  "cvssVector": "AV:L/AC:L/Au:N/C:C/I:C/A:C/E:F/RL:OF/RC:ND",
  "netbiosName": "WORKGROUP\\WIN-1EA12VALDRF",
  "lastSeen": "1462363110",
  "pluginInfo": "85330 (445/6) MS15-085: Vulnerability in Mount Manager Could Allow Elevation of Privilege (3082487)",
  "acceptRisk": "0",
  "port": "445",
  "synopsis": "The remote Windows host is affected by an elevation of privilege vulnerability.",
  "pluginName": "MS15-085: Vulnerability in Mount Manager Could Allow Elevation of Privilege (3082487)",
  "description": "The remote Windows host is affected by an elevation of privilege vulnerability in the Mount Manager component due to improper processing of symbolic links. A local attacker can exploit this vulnerability by inserting a malicious USB device into a user's system, allowing the writing of a malicious binary to disk and the execution of arbitrary code.",
  "checkType": "local",
  "bid": "76222",
  "riskFactor": "High",
  "family": {
    "name": "Windows : Microsoft Bulletins",
    "id": "10",
    "type": "active"
  },
  "stigSeverity": "I",
  "cpe": "cpe:/o:microsoft:windows",
  "pluginText": "<plugin_output>\n\n  - C:\\Windows\\system32\\ntdll.dll has not been patched.\n    Remote version : 6.1.7601.17514\n    Should be      : 6.1.7601.18933\n\n</plugin_output>",
  "exploitEase": "Exploits are available",
  "patchPubDate": "1439308800",
  "exploitAvailable": "Yes",
  "exploitFrameworks": "",
  "vulnPubDate": "1439308800",
  "macAddress": "##mac_address##",
  "solution": "Microsoft has released a set of patches for Windows Vista, 2008, 7, 2008 R2, 8, RT, 2012, 8.1, RT 8.1, 2012 R2, and 10.",
  "ip": "##ip##",
  "firstSeen": "1462363110",
  "version": "Revision: 1.3",
  "cve": "XSA-26",
  "baseScore": "7.2",
  "temporalScore": "6.0",
  "protocol": "TCP",
  "severity": {
    "name": "High",
    "id": "3",
    "description": "High Severity"
  },
  "pluginID": "85330",
  "seeAlso": [
    "##url##"
  ],
  "hasBeenMitigated": "0",
  "xref": [
    "CERT #252743",
    "CERT #577193",
    "CWE #200",
    "OSVDB #50036"
  ],
  "repository": {
    "name": "Individual Scan",
    "id": -1,
    "description": ""
  },
  "pluginModDate": "1439740800",
  "dnsName": "test_dns_name",
  "pluginPubDate": "1439308800",
  "recastRisk": "0"
}

any guidance?

You didn't describe entirely what the data structure of your file contains- I think this endpoint would correctly parse your JSON if you used newline-delimited JSON (one blob per line), but not if you have a JSON array with no newlines. Docs are here https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-find-file-structure.html

Hi Wylie,

Thank you for pointing out that the original post did not include the full file structure. For each of the above files, the file as a whole contained a newline separated JSON array, so one JSON element on each line.

My data actually originally was not in an array. So, each line was just it's own JSON blob, as you mentioned. That file also gets stuck "Analyzing data".

When I add a timestamp field to each JSON blob, it still get stuck "Analyzing data"

If the JSON is already in the final format, you should use the API instead of using the data visualizer: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html