JSON string to Nested type

arham · January 7, 2020, 2:30pm

Hi,

I am using ingest pipeline to map FScrawler fields to elasticsearch index fields.

I have json string like "Tags": [{"Tag":"Ford"}, {"Tag":"BMW"}, {"Tag":"Fiat"}] .

How i can map this data to my index nested type field using ingest pipeline?

Following is my index setting

PUT my_index
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "dynamic": false,
    "properties": {
      "Tags":
       {
        "type": "nested",
        "properties": { 
          "Tag":  { "type": "keyword" }}
      }
    }
  }
}

dadoonet · January 7, 2020, 2:42pm

I'm not sure I'm understanding the question and how this relates to FSCrawler.

Could you explain with a full example of a document before being processed by an ingest pipeline and what the document should look like after it has been processed by the pipeline?

arham · January 8, 2020, 1:12pm

Sure i can explain more

I am using FScrawler for crawling PDF files content and custom metadata into elastic search already created index for fast search. PDF files has too many metadata field so i have used ingest pipeline with rename and remove processors that worked fine.

Now i have the only issue that i have multi value field in my elastic search index with nested type i.e.

 "Tags":
       {
        "type": "nested",
        "properties": { 
          "Tag":  { "type": "keyword" }}
      }

ingest pipeline mapping

PUT _ingest/pipeline/my_mapping
{
  "processors": [
    {
      "rename": {
        "field": "meta.raw.Tags",
        "target_field": "Tags",
        "ignore_missing": true
      }
    },
    {
    "remove":{
      "field": ["meta"]
      }
    
    }
  ]
}

FSCrawler generated data

 "_source" : {
          "content" : "my test content",
          "meta" : {
            "date" : "2019-12-12T10:09:13.000+0000",
            "format" : "application/pdf; version=1.4",
            "created" : "2017-01-12T10:03:50.000+0000",
            "raw" : {
              "date" : "2019-12-12T15:09:13Z",
              "pdf:PDFVersion" : "1.4",
              "access_permission:can_print_degraded" : "true",
              "pdfa:PDFVersion" : "A-1a",
              "dc:format" : "application/pdf; version=1.4",
              "access_permission:fill_in_form" : "true",
              "pdf:encrypted" : "false",
              "modified" : "2019-12-12T15:09:13Z",
              "Status" : "1",
              "SeqNo" : "2",
              "created" : "2017-01-12T15:03:50Z",
              "access_permission:extract_for_accessibility" : "true",
              "Creation-Date" : "2017-01-12T15:03:50Z",
              "25107-0208-ComponentReleaseCertificate-20090207-5304-201912110909541826",
              "pdfaid:part" : "1",
              "OCR" : "1",
			  "Tags" : "\"Tags\": [{\"Tag\":\"Ford\"}, {\"Tag\":\"BMW\"}, {\"Tag\":\"Fiat\"}]",
            }
          }
        }
      }

after running fscrawler an error occured

ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=object mapping for [Tags] tried to parse field [Tags] as object, but found a concrete value]]

Please guide me how to solve this problem

dadoonet · January 8, 2020, 1:30pm

Please don't post unformatted code, logs, or configuration as it's very hard to read.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.

arham · January 8, 2020, 1:37pm

I have updated format, please check now

dadoonet · January 8, 2020, 3:53pm

What is the mapping?

arham · January 8, 2020, 3:57pm

It is actually ingest pipeline to insert FSCrawler data in already created elastic search index fields

dadoonet · January 8, 2020, 4:08pm

Can you please share the current mapping?

arham · January 9, 2020, 5:40am

PUT _ingest/pipeline/my_mapping
{
  "processors": [
    {
      "rename": {
        "field": "meta.raw.Tags",
        "target_field": "Tags",
        "ignore_missing": true
      }
    },
    {
    "remove":{
      "field": ["meta"]
      }
    
    }
  ]
}

arham · January 10, 2020, 7:03am

I am stuck in this issue please help.

dadoonet · January 10, 2020, 7:38am

You did not provide the mapping. This is something you can get with:

GET your_index_name/_mapping

arham · January 15, 2020, 12:08pm

{
  "indexamac" : {
    "mappings" : {
      "dynamic" : "false",
      "properties" : {
        "Tags" : {
          "type" : "nested",
          "properties" : {
            "Tag" : {
              "type" : "keyword"
            }
          }
        }
      }
    }
  }
}

dadoonet · January 15, 2020, 12:26pm

Why are you using nested here instead of a simple array of tags?
Also I don't believe it's the complete mapping.

arham · January 15, 2020, 1:20pm

As per reference of following Elastic search reference

https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html

I am using nested type because field Tags is multi value and i want to query on each value independently.

dadoonet · January 15, 2020, 1:44pm

Do you have an example of a document and a query which won't work with an object and requires a nested instead?

I have an idea about it but I'd like to make sure if your use case actually needs it.

system · February 12, 2020, 1:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ingest attachment in nested type Elasticsearch	2	714	April 29, 2020
Convert in pipeline is not converting Elasticsearch	3	89	May 22, 2024
Ingest pipeline for nested type Elasticsearch	1	621	February 13, 2020
Convert Field data type while re-indexing the Index In Elasticsearch Elasticsearch	9	3081	April 20, 2020
Ingestion pipeline dynamically creates empty object upon nested JSON Elasticsearch	3	341	January 3, 2023

JSON string to Nested type

Related topics