ES ignores mapping when using Python DSL bulk

druzmieres · February 15, 2018, 6:16pm

I'm trying to upload a CSV file to an elasticsearch index. Let's say the file is something like this (no headers, just data):

bird,10,dog
cat,20,giraffe

This is the code I have:

from elasticsearch_dsl import DocType, Integer, Keyword
from elasticsearch_dsl.connections import connections
from elasticsearch.helpers import bulk
import csv

connections.create_connection(hosts=["localhost"])

class Mapping(DocType):    
    animal1 = Keyword()
    number = Integer()
    animal2 = Keyword()

    class Meta:
        index = "index-name"
        doc_type = "doc-type" 

Mapping.init()
with open("/path/to/file", "r", encoding="latin-1") as f:
    reader = csv.DictReader(f)
    bulk(
        connections.get_connection(),
        (Mapping(**row).to_dict(True) for row in reader)
    )

The problem is that elasticsearch seems to be ignoring the mapping and using the first line of the file as headers (and creating a mapping based on that).

Edit: it really uses my mapping and the first line of the file. The mapping it generates is:

{
  "index-name": {
    "mappings": {
      "doc-type": {
        "properties": {
          "10": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "dog": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "animal1": {
            "type": "keyword"
          },
          "animal2": {
            "type": "keyword"
          },
          "bird": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "number": {
            "type": "integer"
          }
        }
      }
    }
  }
}

If I only create the index without uploading data, the mapping seems fine:

{
  "index-name": {
    "mappings": {
      "doc-type": {
        "properties": {
          "animal1": {
            "type": "keyword"
          },
          "animal2": {
            "type": "keyword"
          },
          "number": {
            "type": "integer"
          }
        }
      }
    }
  }
}

How can I make ES use the given mapping and just that?

druzmieres · February 19, 2018, 4:04pm

The problem wasn't bulk. csv.DictReader always reads the first line from the file to get the headers for subsequent rows. So if you're going to use DictReader, the file needs a header.

system · March 19, 2018, 4:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loading CSV to elasticsearch index with mapping using Python API Elasticsearch	2	7227	March 19, 2018
My Mapping with python is not working Elasticsearch	9	904	May 16, 2022
Add ignore_above mapping when creating documents with Python elasticsearch_dsl Elasticsearch	1	676	November 9, 2018
Elastic Search Python mappings Elasticsearch	2	5313	April 13, 2017
Python index CSV into ES date issue Elasticsearch	3	820	January 26, 2021

ES ignores mapping when using Python DSL bulk

Related topics