JSON string to Nested type

Hi,

I am using ingest pipeline to map FScrawler fields to elasticsearch index fields.

I have json string like "Tags": [{"Tag":"Ford"}, {"Tag":"BMW"}, {"Tag":"Fiat"}] .

How i can map this data to my index nested type field using ingest pipeline?

Following is my index setting

PUT my_index
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "dynamic": false,
    "properties": {
      "Tags":
       {
        "type": "nested",
        "properties": { 
          "Tag":  { "type": "keyword" }}
      }
    }
  }
}

I'm not sure I'm understanding the question and how this relates to FSCrawler.

Could you explain with a full example of a document before being processed by an ingest pipeline and what the document should look like after it has been processed by the pipeline?

Sure i can explain more

I am using FScrawler for crawling PDF files content and custom metadata into elastic search already created index for fast search. PDF files has too many metadata field so i have used ingest pipeline with rename and remove processors that worked fine.

Now i have the only issue that i have multi value field in my elastic search index with nested type i.e.

 "Tags":
       {
        "type": "nested",
        "properties": { 
          "Tag":  { "type": "keyword" }}
      }

ingest pipeline mapping

PUT _ingest/pipeline/my_mapping
{
  "processors": [
    {
      "rename": {
        "field": "meta.raw.Tags",
        "target_field": "Tags",
        "ignore_missing": true
      }
    },
    {
    "remove":{
      "field": ["meta"]
      }
    
    }
  ]
}

FSCrawler generated data

 "_source" : {
          "content" : "my test content",
          "meta" : {
            "date" : "2019-12-12T10:09:13.000+0000",
            "format" : "application/pdf; version=1.4",
            "created" : "2017-01-12T10:03:50.000+0000",
            "raw" : {
              "date" : "2019-12-12T15:09:13Z",
              "pdf:PDFVersion" : "1.4",
              "access_permission:can_print_degraded" : "true",
              "pdfa:PDFVersion" : "A-1a",
              "dc:format" : "application/pdf; version=1.4",
              "access_permission:fill_in_form" : "true",
              "pdf:encrypted" : "false",
              "modified" : "2019-12-12T15:09:13Z",
              "Status" : "1",
              "SeqNo" : "2",
              "created" : "2017-01-12T15:03:50Z",
              "access_permission:extract_for_accessibility" : "true",
              "Creation-Date" : "2017-01-12T15:03:50Z",
              "25107-0208-ComponentReleaseCertificate-20090207-5304-201912110909541826",
              "pdfaid:part" : "1",
              "OCR" : "1",
			  "Tags" : "\"Tags\": [{\"Tag\":\"Ford\"}, {\"Tag\":\"BMW\"}, {\"Tag\":\"Fiat\"}]",
            }
          }
        }
      }

after running fscrawler an error occured

ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=object mapping for [Tags] tried to parse field [Tags] as object, but found a concrete value]]

Please guide me how to solve this problem

Please don't post unformatted code, logs, or configuration as it's very hard to read.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.

I have updated format, please check now

What is the mapping?

It is actually ingest pipeline to insert FSCrawler data in already created elastic search index fields

Can you please share the current mapping?

PUT _ingest/pipeline/my_mapping
{
  "processors": [
    {
      "rename": {
        "field": "meta.raw.Tags",
        "target_field": "Tags",
        "ignore_missing": true
      }
    },
    {
    "remove":{
      "field": ["meta"]
      }
    
    }
  ]
}

I am stuck in this issue please help.

You did not provide the mapping. This is something you can get with:

GET your_index_name/_mapping
{
  "indexamac" : {
    "mappings" : {
      "dynamic" : "false",
      "properties" : {
        "Tags" : {
          "type" : "nested",
          "properties" : {
            "Tag" : {
              "type" : "keyword"
            }
          }
        }
      }
    }
  }
}

Why are you using nested here instead of a simple array of tags?
Also I don't believe it's the complete mapping.

As per reference of following Elastic search reference

https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html

I am using nested type because field Tags is multi value and i want to query on each value independently.

Do you have an example of a document and a query which won't work with an object and requires a nested instead?

I have an idea about it but I'd like to make sure if your use case actually needs it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.