Problem with JSON file import into ElasticSearch

Exported data from Mongo and generated JSON file.

Sample JSON File data (1 record):

{
  "_id": {
    "$oid": "5d84438f4514cb1f8a"
  },
  "slug": "test-home-technology",
  "title": "test home technology",
  "subtitle": "test home tech has a way",
  "content": {
    "text": "test"
  },
  "language": "en",
  "rootId": "110.6",
  "region": [
    "GLOBAL"
  ],
  "image": {
    "uri": "test/test.jpg",
    "alt": "test alt",
    "class": ""
  },
  "duration": 152,
  "fullReport": {
    "caption": "View full report",
    "uri": "",
    "class": ""
  },
  "numViews": 0
}

I have used below command to import JSON file into Elastic Serach.

curl -H 'Content-Type: application/x-ndjson' -s -XPOST 'localhost:9200/sample/test/_bulk?pretty' --data-binary @test.json

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Action/metadata line [1] contains an unknown parameter [$oid]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Action/metadata line [1] contains an unknown parameter [$oid]"
  },
  "status" : 400
}

Do I need to create mapping or Schema or index before importing JSON file? Please suggest the steps to import JSON file/ Mongo Collection into ES.

You cannot post random JSON into the _bulk interface. It requires an action to precede the document.

That said, this is an elasticsearch question, not a logstash question, and you might get a better answer if you moved it.

which is the correct interface to import JSON data?

@siva4
You can pass json data to bulk. But the bulk request needs to be formatted correctly. See link in Badger's post.

Exported json should be a single line (remove pretty print). Each line in the file must terminate with \n.

Since you are including index name and doc type in url test.json file should be something like

{ "index" : {}}
{"_id":{"$oid":"5d84438f4514cb1f8a"},"slug":"test-home-technology","title":"test home technology","subtitle":"test home tech has a way","content":{"text":"test"},"language":"en","rootId":"110.6","region":["GLOBAL"],"image":{"uri":"test/test.jpg","alt":"test alt","class":""},"duration":152,"fullReport":{"caption":"View full report","uri":"","class":""},"numViews":0}

bulk is used for ingesting multiple documents in a single request. Each document requires similar two lines.

Also general practice is to use doc type as _doc instead of test. With ES 7 you can omit it from the url.
curl -H 'Content-Type: application/x-ndjson' -s -XPOST 'localhost:9200/sample/_bulk?pretty' --data-binary @test.json

@Vinayak_Sapre

I have tried as suggested above and received error. "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."

{ "index" : {}}
{ "_id" : { "$oid" : "5d84438f4514cb1f8af87260" }, "slug" : "smart-home-technology-is-still-fairly-niche", "title" : "Smart home technology is still fairly niche", "subtitle" : "Smart home tech has a ways to go if it wants to unseat the humble smartphone", "content" : { "text" : "For all of the hype and media coverage, smart home technology has yet to make a serious dent in the industry, with just 12% of internet users saying they own a smart home device.\r\n\r\n**Smart home product owners**\r\nHowever, a look at younger demographics reveals a different picture – more than a third of 16-24s have a smart home product.\r\n\r\nAdoption rates also vary by region – they’re highest in North America (17%) and lowest in MEA regions (6%), for example. In fact, it’s estimated that more than 63 million homes in North America will be smart by 2022, which equates to 44% of all homes in the region.\r\n\r\nSmart speakers are the most commonly owned smart home devices among smart home product owners. They’re heralded as the future of everything, from search to shopping.\r\n\r\n> Consumers can’t see a compelling enough reason to fork out the expense for these devices when mobiles can easily match their functionality.\r\n\r\nWhat’s interesting about this is that older people are more likely to own a smart speaker than their younger counterparts, with two-thirds of 55-64 year-old smart home product owners having one.\r\n\r\nAnd while many would assume it’s because older age groups are more affluent, this trend is prevalent even when we look at high-income earners in the younger set.\r\n\r\nSmart utility products are the second-most prominent devices among smart home product owners, and once again, there’s a strong uptake with older demographics. 42% of 55-64-year-olds own one, compared to 35% of 16-24 year olds.\r\n\r\nWith the popularity of smart speakers and the frequent discount promotions offered by the likes of Amazon and Google, smart home products shouldn’t struggle to break past the 15% adoption mark.\r\n\r\n**Factors influencing the growth of smart home products**\r\nThe challenge in further growth for smart home products lies not just in convincing consumers that they can add value into everyday lives but, more importantly, in showing that their value proposition can’t be fulfilled by a smartphone. \r\n\r\nIt’s for this reason that smartwatches and smart wristbands have failed to push past the 15% mark – consumers can’t see a compelling enough reason to fork out the expense for these devices when mobiles can easily match their functionality.\r\n\r\nTablets have also encountered this problem, but their role as a household device, coupled with their popularity among older demographics and families, can be attributed to 37% of digital consumers saying they own one. \r\n\r\nThe similar demographics and household nature of smart home product usage do suggest that the potential market for these devices could be at least as big as that of the tablet market. \r\n\r\nWhat’s important to bear in mind here, however, is that the smart home product market is still very much in its infancy, which actually makes the 12% adoption rate quite impressive. \r\n\r\nOnce 5G moves beyond the early adopter phase and more household appliances and automotive manufacturers begin rolling out smart home-enabled products, the perceived value and usefulness of these devices could grow exponentially as ecosystems of smart products begin to grow in each household." }, "language" : "en", "rootId" : "110.6", "region" : [ "GLOBAL" ], "image" : { "uri" : "artifacts/110-6_article_smart-home-technology-is-still-fairly-niche_artifact.jpg", "alt" : "An individual looking at their smart watch and laptop with key information about their home security and temperature", "class" : "" }, "duration" : 152, "fullReport" : { "caption" : "View full report", "uri" : "", "class" : "" }, "disclaimer" : "Copyright © Trendstream Limited 2019.\r\n\r\nThis report was designed, produced and, where relevant, translated by PwC, using data, charts and commentary supplied in English by Trendstream Limited. \r\n\r\nAll rights, including copyright, in the content of GlobalWebIndex (GWI) publications, are owned and controlled by Trendstream Limited. In accessing such content, you agree that you will not reproduce or share the content outside of PwC without the prior written permission of Trendstream Limited. Trendstream Limited uses its reasonable endeavors to ensure the accuracy of all data in GWI publications. However, in accessing the content of GWI publications, you agree that you are responsible for your use of such data and Trendstream Limited shall have no liability to you for any loss, damage, cost or expense whether direct, indirect consequential or otherwise, incurred by, or arising by reason of, your use of the data and whether caused by reason of any error, omission or misrepresentation in the data or otherwise", "publishedAt" : { "$date" : "2019-09-22T00:00:00.000+0000" }, "isActive" : true, "createdAt" : { "$date" : "2020-01-17T20:31:11.215+0000" }, "updatedAt" : { "$date" : "2020-01-20T19:46:57.263+0000" }, "sponsorId" : { "$oid" : "5d7c717447b8175530aa6b5e" }, "typeId" : { "$oid" : "5afd01d6f1aa65e0f5221612" }, "formatId" : { "$oid" : "5afd01d6f1aa65e0f522160c" }, "difficultyId" : { "$oid" : "5b070c2ad76854e178932677" }, "tags" : [ "smart homes", "behavior", "seeking insight", "smart homes" ], "topicIds" : [ { "$oid" : "5b083c3d40b797aef1bf57fc" } ], "id" : { "$oid" : "5d84438f4514cb1f8af87260" }, "numViews" : 0 }
{ "index" : {}}
{ "_id" : { "$oid" : "5b46c15ed91f1741408229c5" }, "slug" : "will-articial-intelligence-win-the-caption-contest", "title" : "Will artificial intelligence win the caption contest?", "subtitle" : "Neural nets have mastered the ability to label, now they’re learning to tell stories", "content" : { "text" : "When social-media users upload photographs and caption them, they don’t just label their contents. They tell a story, which gives the photos context and additional emotional meaning.\r\n\r\nA paper published by Microsoft Research describes an image captioning system that mimics humans’ unique style of visual storytelling. Companies like Microsoft, Google, and Facebook have spent years teaching computers to label the contents of images, but this new research takes it a step further by teaching a neural-network-based system to infer a story from several images. Someday it could be used to automatically generate descriptions for sets of images, or to bring humanlike language to other applications for artificial intelligence.\r\n\r\n\r\n\r\n> Storytelling is an important part of being human.\r\n\r\n\r\n\r\n\"Rather than giving bland or vanilla descriptions of what’s happening in the images, we put those into a larger narrative context,\" says Frank Ferraro, a Johns Hopkins University PhD student who coauthored the paper. \"You can start making likely inferences of what might be happening.\"\r\n\r\n\r\n\r\nConsider an album of pictures depicting a group of friends celebrating a birthday at a bar. Some of the early pictures show people ordering beer and drinking it, while a later photo shows someone asleep on a couch.\r\n\r\n\r\n\r\n\"A captioning system might just say, ‘A person lying on a couch,’\" Ferraro says. \"But a storytelling system might be able to say, ‘Well, given that I think these people were out partying or out eating and drinking, then this person may be drunk.’\"\r\n\r\n\r\n\r\nOne example listed in the paper includes a series of five images. They show a family gathered around a table, a plate of shellfish, a dog, and images from the beach. The neural network described them with a story reading, \"The family got together for a cookout. They had a lot of delicious food. The dog was happy to be there. They had a great time on the beach. They even had a swim in the water.\"\r\n\r\n\r\n\r\nAn approach similar to those used to label the contents of single photos produced stories that were too generic. To counter this, the team developed a way for the network to choose words that were likely to be visually salient. They also required that the system not repeat words.\r\n\r\n\r\n\r\nStorytelling is an important part of being human, says Stanford Vision Lab director Fei-Fei Li, who did not contribute to the research. Technology that can imitate humans’ techniques for documenting stories needs to be able to cross-reference objects and characters seen in multiple pictures and infer relationships between people, objects, and places.\r\n\r\n\r\n\r\n\"The published paper is just the beginning toward this kind of technology,\" Li says. \"But it is a good step forward to start tackling such an ambitious project. I look forward to more follow-up work from these authors and others.\"" }, "language" : "en", "rootId" : "49.5", "region" : [ "GLOBAL" ], "image" : { "uri" : "artifacts/049-5-will-ai-win-the-caption-contest_artifact.jpg", "alt" : "", "class" : "" }, "duration" : 190, "fullReport" : { "caption" : "View full report", "uri" : "", "class" : "" }, "disclaimer" : "© 2018 Originally published on Technologyreview.com", "publishedAt" : { "$date" : "2018-08-23T00:00:00.000+0000" }, "isActive" : true, "createdAt" : { "$date" : "2020-01-17T20:31:11.219+0000" }, "updatedAt" : { "$date" : "2020-01-20T19:46:57.268+0000" }, "sponsorId" : { "$oid" : "5b2f021d272b221a08530275" }, "typeId" : { "$oid" : "5afd01d6f1aa65e0f5221612" }, "formatId" : { "$oid" : "5afd01d6f1aa65e0f522160c" }, "difficultyId" : { "$oid" : "5b070c2ad76854e178932677" }, "tags" : [ "storytelling", "mindset", "curiosity", "storytelling" ], "topicIds" : [ { "$oid" : "5b083c3d40b797aef1bf57f9" } ], "id" : { "$oid" : "5b46c15ed91f1741408229c5" }, "numViews" : 7, "numLikes" : 1 }

Error message:

curl -H 'Content-Type: application/x-ndjson' -s -XPOST 'localhost:9200/edge/artifact/_bulk?pretty' --data-binary @artifact.json
{
  "took" : 0,
  "errors" : true,
  "items" : [
    {
      "index" : {
        "_index" : "edge",
        "_type" : "artifact",
        "_id" : "s2fzSXMBhrrgf1ITDTxC",
        "status" : 400,
        "error" : {
          "type" : "mapper_parsing_exception",
          "reason" : "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
        }
      }
    },
    {
      "index" : {
        "_index" : "edge",
        "_type" : "artifact",
        "_id" : "tGfzSXMBhrrgf1ITDTxC",
        "status" : 400,
        "error" : {
          "type" : "mapper_parsing_exception",
          "reason" : "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
        }
      }
    }
  ]
}

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html is the docs for this API endpoint.

@siva4
Sorry I didn't notice _id field. There are 3 special fields _index (name of the index) _type (document type) and _id (document id). You need to specify those on the action line.

If you want to retain mongo oid as ES document id (unique identifier), you need to format it as

{ "index" : {"_id" :  "5d84438f4514cb1f8af87260" }}
{ "slug" : "smart-home-technology-is-still-fairly-niche", "title" : "Smart home technology is still fairly niche", "subtitle" : "Smart home tech has a ways to go if it wants to unseat the humble smartphone", "content" : { "text" : "For all of the hype and media coverage, smart home technology has yet to make a serious dent in the industry, with just 12% of internet users saying they own a smart home device.\r\n\r\n**Smart home product owners**\r\nHowever, a look at younger demographics reveals a different picture – more than a third of 16-24s have a smart home product.\r\n\r\nAdoption rates also vary by region – they’re highest in North America (17%) and lowest in MEA regions (6%), for example. In fact, it’s estimated that more than 63 million homes in North America will be smart by 2022, which equates to 44% of all homes in the region.\r\n\r\nSmart speakers are the most commonly owned smart home devices among smart home product owners. They’re heralded as the future of everything, from search to shopping.\r\n\r\n> Consumers can’t see a compelling enough reason to fork out the expense for these devices when mobiles can easily match their functionality.\r\n\r\nWhat’s interesting about this is that older people are more likely to own a smart speaker than their younger counterparts, with two-thirds of 55-64 year-old smart home product owners having one.\r\n\r\nAnd while many would assume it’s because older age groups are more affluent, this trend is prevalent even when we look at high-income earners in the younger set.\r\n\r\nSmart utility products are the second-most prominent devices among smart home product owners, and once again, there’s a strong uptake with older demographics. 42% of 55-64-year-olds own one, compared to 35% of 16-24 year olds.\r\n\r\nWith the popularity of smart speakers and the frequent discount promotions offered by the likes of Amazon and Google, smart home products shouldn’t struggle to break past the 15% adoption mark.\r\n\r\n**Factors influencing the growth of smart home products**\r\nThe challenge in further growth for smart home products lies not just in convincing consumers that they can add value into everyday lives but, more importantly, in showing that their value proposition can’t be fulfilled by a smartphone. \r\n\r\nIt’s for this reason that smartwatches and smart wristbands have failed to push past the 15% mark – consumers can’t see a compelling enough reason to fork out the expense for these devices when mobiles can easily match their functionality.\r\n\r\nTablets have also encountered this problem, but their role as a household device, coupled with their popularity among older demographics and families, can be attributed to 37% of digital consumers saying they own one. \r\n\r\nThe similar demographics and household nature of smart home product usage do suggest that the potential market for these devices could be at least as big as that of the tablet market. \r\n\r\nWhat’s important to bear in mind here, however, is that the smart home product market is still very much in its infancy, which actually makes the 12% adoption rate quite impressive. \r\n\r\nOnce 5G moves beyond the early adopter phase and more household appliances and automotive manufacturers begin rolling out smart home-enabled products, the perceived value and usefulness of these devices could grow exponentially as ecosystems of smart products begin to grow in each household." }, "language" : "en", "rootId" : "110.6", "region" : [ "GLOBAL" ], "image" : { "uri" : "artifacts/110-6_article_smart-home-technology-is-still-fairly-niche_artifact.jpg", "alt" : "An individual looking at their smart watch and laptop with key information about their home security and temperature", "class" : "" }, "duration" : 152, "fullReport" : { "caption" : "View full report", "uri" : "", "class" : "" }, "disclaimer" : "Copyright © Trendstream Limited 2019.\r\n\r\nThis report was designed, produced and, where relevant, translated by PwC, using data, charts and commentary supplied in English by Trendstream Limited. \r\n\r\nAll rights, including copyright, in the content of GlobalWebIndex (GWI) publications, are owned and controlled by Trendstream Limited. In accessing such content, you agree that you will not reproduce or share the content outside of PwC without the prior written permission of Trendstream Limited. Trendstream Limited uses its reasonable endeavors to ensure the accuracy of all data in GWI publications. However, in accessing the content of GWI publications, you agree that you are responsible for your use of such data and Trendstream Limited shall have no liability to you for any loss, damage, cost or expense whether direct, indirect consequential or otherwise, incurred by, or arising by reason of, your use of the data and whether caused by reason of any error, omission or misrepresentation in the data or otherwise", "publishedAt" : { "$date" : "2019-09-22T00:00:00.000+0000" }, "isActive" : true, "createdAt" : { "$date" : "2020-01-17T20:31:11.215+0000" }, "updatedAt" : { "$date" : "2020-01-20T19:46:57.263+0000" }, "sponsorId" : { "$oid" : "5d7c717447b8175530aa6b5e" }, "typeId" : { "$oid" : "5afd01d6f1aa65e0f5221612" }, "formatId" : { "$oid" : "5afd01d6f1aa65e0f522160c" }, "difficultyId" : { "$oid" : "5b070c2ad76854e178932677" }, "tags" : [ "smart homes", "behavior", "seeking insight", "smart homes" ], "topicIds" : [ { "$oid" : "5b083c3d40b797aef1bf57fc" } ], "id" : { "$oid" : "5d84438f4514cb1f8af87260" }, "numViews" : 0 }

If you was ES to generate document id for you, just remove _id from the index line or use some other name. I noticed there is another id field with identical value.

{ "index" : {}}
{ "slug" : "smart-home-technology-is-still-fairly-niche", "title" : "Smart home technology is still fairly niche", "subtitle" : "Smart home tech has a ways to go if it wants to unseat the humble smartphone", "content" : { "text" : "For all of the hype and media coverage, smart home technology has yet to make a serious dent in the industry, with just 12% of internet users saying they own a smart home device.\r\n\r\n**Smart home product owners**\r\nHowever, a look at younger demographics reveals a different picture – more than a third of 16-24s have a smart home product.\r\n\r\nAdoption rates also vary by region – they’re highest in North America (17%) and lowest in MEA regions (6%), for example. In fact, it’s estimated that more than 63 million homes in North America will be smart by 2022, which equates to 44% of all homes in the region.\r\n\r\nSmart speakers are the most commonly owned smart home devices among smart home product owners. They’re heralded as the future of everything, from search to shopping.\r\n\r\n> Consumers can’t see a compelling enough reason to fork out the expense for these devices when mobiles can easily match their functionality.\r\n\r\nWhat’s interesting about this is that older people are more likely to own a smart speaker than their younger counterparts, with two-thirds of 55-64 year-old smart home product owners having one.\r\n\r\nAnd while many would assume it’s because older age groups are more affluent, this trend is prevalent even when we look at high-income earners in the younger set.\r\n\r\nSmart utility products are the second-most prominent devices among smart home product owners, and once again, there’s a strong uptake with older demographics. 42% of 55-64-year-olds own one, compared to 35% of 16-24 year olds.\r\n\r\nWith the popularity of smart speakers and the frequent discount promotions offered by the likes of Amazon and Google, smart home products shouldn’t struggle to break past the 15% adoption mark.\r\n\r\n**Factors influencing the growth of smart home products**\r\nThe challenge in further growth for smart home products lies not just in convincing consumers that they can add value into everyday lives but, more importantly, in showing that their value proposition can’t be fulfilled by a smartphone. \r\n\r\nIt’s for this reason that smartwatches and smart wristbands have failed to push past the 15% mark – consumers can’t see a compelling enough reason to fork out the expense for these devices when mobiles can easily match their functionality.\r\n\r\nTablets have also encountered this problem, but their role as a household device, coupled with their popularity among older demographics and families, can be attributed to 37% of digital consumers saying they own one. \r\n\r\nThe similar demographics and household nature of smart home product usage do suggest that the potential market for these devices could be at least as big as that of the tablet market. \r\n\r\nWhat’s important to bear in mind here, however, is that the smart home product market is still very much in its infancy, which actually makes the 12% adoption rate quite impressive. \r\n\r\nOnce 5G moves beyond the early adopter phase and more household appliances and automotive manufacturers begin rolling out smart home-enabled products, the perceived value and usefulness of these devices could grow exponentially as ecosystems of smart products begin to grow in each household." }, "language" : "en", "rootId" : "110.6", "region" : [ "GLOBAL" ], "image" : { "uri" : "artifacts/110-6_article_smart-home-technology-is-still-fairly-niche_artifact.jpg", "alt" : "An individual looking at their smart watch and laptop with key information about their home security and temperature", "class" : "" }, "duration" : 152, "fullReport" : { "caption" : "View full report", "uri" : "", "class" : "" }, "disclaimer" : "Copyright © Trendstream Limited 2019.\r\n\r\nThis report was designed, produced and, where relevant, translated by PwC, using data, charts and commentary supplied in English by Trendstream Limited. \r\n\r\nAll rights, including copyright, in the content of GlobalWebIndex (GWI) publications, are owned and controlled by Trendstream Limited. In accessing such content, you agree that you will not reproduce or share the content outside of PwC without the prior written permission of Trendstream Limited. Trendstream Limited uses its reasonable endeavors to ensure the accuracy of all data in GWI publications. However, in accessing the content of GWI publications, you agree that you are responsible for your use of such data and Trendstream Limited shall have no liability to you for any loss, damage, cost or expense whether direct, indirect consequential or otherwise, incurred by, or arising by reason of, your use of the data and whether caused by reason of any error, omission or misrepresentation in the data or otherwise", "publishedAt" : { "$date" : "2019-09-22T00:00:00.000+0000" }, "isActive" : true, "createdAt" : { "$date" : "2020-01-17T20:31:11.215+0000" }, "updatedAt" : { "$date" : "2020-01-20T19:46:57.263+0000" }, "sponsorId" : { "$oid" : "5d7c717447b8175530aa6b5e" }, "typeId" : { "$oid" : "5afd01d6f1aa65e0f5221612" }, "formatId" : { "$oid" : "5afd01d6f1aa65e0f522160c" }, "difficultyId" : { "$oid" : "5b070c2ad76854e178932677" }, "tags" : [ "smart homes", "behavior", "seeking insight", "smart homes" ], "topicIds" : [ { "$oid" : "5b083c3d40b797aef1bf57fc" } ], "id" : { "$oid" : "5d84438f4514cb1f8af87260" }, "numViews" : 0 }

@Vinayak_Sapre

Thanks, Couple of issues with related during bulk loading, some of the docs were loaded due to data type issues. It is possible to get different type of data in any field. How is this can be handled?

  1. "reason" : "object mapping for [testId] tried to parse field [testId] as object, but found a concrete value"

  2. "reason" : "failed to parse field [content.transcript.duration] of type [text] in document with id '5c8a5bcf40cf244f0aaa6688'. Preview of field's value: '{$date=1899-12-30T22:39:00.000+0000}'",

  3. "reason" : "object mapping for [topicIds] tried to parse field [null] as object, but found a concrete value"

Only if value can be coerced. For ex. for a keyword or text field type in the index you can pass numeric value. But not other way around.

If testid in a document looks "testid" : {...} or you have "testid.foo" field, you cannot pass "testid" : "abc" in subsequent documents. The former says testid is an object and later says it's a string. Object and Keyword/Text are incompatible with each other.

You need to determine correct type for each field and define mapping. For data that's not matching, you should use ingest processor or logstash to clean data or store it in other field of different type.

Elasticsearch is not a schemaless data store so each field need a single mapping. Your data has different types for the same field that results in mapping conflicts. This is something you need to resolve before indexing the data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.