Problems upload xml and json files in data visualizer

Hi everyone.
I am Alvaro, a telecomunication student who is working a litle bit with kibana in the cloud to learn more about elasticsearch.
Now, I am trying to upload two diferent files (one is xml and the other is json) to kibana using the data visualizer to work and compare them.
The files contain song documents with some fields about the name, the artist, etc. They are from MusicBrainz and Discogs and they are bigger than 100 MB so I have split them in small files.
When I tried to upload them, the data visualizer tells that it doesnt recognize any timestamp. And when I put a timestamp like this (1/31/2019 1:40PM) in the top of the file, the next message appears:
File could not be read

[illegal_argument_exception] Merging lines into messages resulted in an unacceptably long message. Merged message would have [10] lines and [12133] characters (limit [10000]). If you have messages this big please increase the value of [line_merge_size_limit]. Otherwise it probably means the timestamp has been incorrectly detected, so try overriding that.
But I dont know how to increase that value...
Anyone knows how would be the easiest way to upload these two files to elastic? I thought that the faster way was usind the data visualizer but if there is other form I would want to know too.
I attach documents from the two files for offer more info:
MusicBainz:
{"id": "10b9c34b-821d-4cc8-9587-5c7d0cb68865", "tags": , "isrcs": , "title": "Suck", "video": false, "length": null, "rating": {"value": null, "votes-count": 0}, "aliases": , "relations": [{"end": null, "type": "performance", "work": {"id": "20309de2-2847-3753-8a7a-d0e335f57176", "type": null, "iswcs": , "title": "Suck", "aliases": , "type-id": null, "language": null, "languages": , "annotation": null, "attributes": , "disambiguation": ""}, "begin": null, "ended": false, "type-id": "a3005666-a872-32c3-ad06-98af558e99b0", "direction": "forward", "attributes": , "target-type": "work", "source-credit": "", "target-credit": "", "attribute-values": {}}], "annotation": null, "artist-credit": [{"name": "Pigface", "artist": {"id": "11137c88-a9a2-4ffa-a97d-fb058c6d6ce2", "name": "Pigface", "sort-name": "Pigface", "disambiguation": ""}, "joinphrase": ""}], "disambiguation": ""}
Discogs:
<main_release>155102</main_release>212070Samuel L SessionSamuel LElectronicTechno2001New Soil<data_quality>Correct</data_quality>Samuel L - VelvetSamuel L - VelvetSamuel L - Danses D'AfriqueSamuel L - Danses D'AfriqueSamuel L - Body N' SoulSamuel L - Body N' SoulSamuel L - Into The GrooveSamuel L - Into The GrooveSamuel L - Soul SyndromeSamuel L - Soul SyndromeSamuel L - LushSamuel L - LushSamuel L - Velvet ( Direct Me )Samuel L - Velvet ( Direct Me )

Thanks for your time,
Alvaro

Hello @alvarolopez

First, I'd like to verify that the upload of data was successful - is this true? Do you see your data in Discover?

Which visualization tool are you using? The fact that its asking for a timestamp implies that its a time based visualization in which case I'm curious if your data contains any time information. From what I see it doesn't. You could potentially use the @timestamp field which will show when the document was created but that doesn't seem useful for you.

Hello @mattkime
First of all thanks for your answer!
What I tried to do is upload two different files to kibana using the Data Visualizer tool.
I couldn't be able to upload any information so I can't see any document in discover, the problem appears before the data is load. My data doesn't contains any time information, just information about the name, the artist, the id.... but it hasn't got any time field.
I just want to upload the two files (which are 103 MB and 1,5 GB but I can split them I think ) to elastic and work with searches in the documents in them.

@alvarolopez Lets focus on getting that Data Visualizer tool working. can you post a https://gist.github.com/ of a few records that you're trying up upload? I'd like to reproduce the problem on my end so I can find a way around the timestamp issue.

Of course!
I have post part of the two files here:

Thanks a lot @mattkime

the recordingMusicbrainz.json sample worked just fine for me. does it work for you?

Looks like I overlooked the obvious - it doesn't support xml import. I think it would be best to transform the xml to json.

recordingMusicbrainz.json is a 55kB part of the original document which size is 77MB.
I couldn't load recordingMusicbrainz.json from my desktop, the next error appears:
File could not be read

[timeout_exception] Aborting structure analysis during [timestamp format determination] as it has taken longer than the timeout of [25s], with { suppressed={ 0={ type="exception" & reason="Explanation so far:\n[Using character encoding [UTF-8], which matched the input with [100%] confidence]\n[Not NDJSON because an empty object was parsed: ]\n[Not XML because there was a parsing exception: [ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog.]]\n[Not CSV because a line has an unescaped quote that is not at the beginning or end of a field: [{"id": "10b9c34b-821d-4cc8-9587-5c7d0cb68865", "tags": , "isrcs": , "title": "Suck", "video": false, "length": null, "rating": {"value": null, "votes-count": 0}, "aliases": , "relations": [{"end": null, "type": "performance", "work": {"id": "20309de2-2847-3753-8a7a-d0e335f57176", "type": null, "iswcs": , "title": "Suck", "aliases": , "type-id": null, "language": null, "languages": , "annotation": null, "attributes": , "disambiguation": ""}, "begin": null, "ended": false, "type-id": "a3005666-a872-32c3-ad06-98af558e99b0", "direction": "forward", "attributes": , "target-type": "work", "source-credit": "", "target-credit": "", "attribute-values": {}}], "annotation": null, "artist-credit": [{"name": "Pigface", "artist": {"id": "11137c88-a9a2-4ffa-a97d-fb058c6d6ce2", "name": "Pigface", "sort-name": "Pigface", "disambiguation": ""}, "joinphrase": ""}], "disambiguation": ""}]]\n[Not TSV because a line has an unescaped quote that is not at the beginning or end of a field: [{"id": "10b9c34b-821d-4cc8-9587-5c7d0cb68865", "tags": , "isrcs": , "title": "Suck", "video": false, "length": null, "rating": {"value": null, "votes-count": 0}, "aliases": , "relations": [{"end": null, "type": "performance", "work": {"id": "20309de2-2847-3753-8a7a-d0e335f57176", "type": null, "iswcs": , "title": "Suck", "aliases": , "type-id": null, "language": null, "languages": , \

However now I have tried to load the file from the gist and it worked fine. I don't know why this happen because the file is the same but what I think that I am going to do is to split the file in different parts and import that parts in the same index. Do you know what is more or less the max size of a gist? Because I tried to save one of 30000 lines (documents) and I can't save that gist.
Thanks a lot again!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.