Loading CSV to elasticsearch index with mapping using Python API

druzmieres · February 15, 2018, 5:05pm

Using the elasticsearch Python API I want to create an elasticsearch index with a mapping so that when I upload a CSV file the documents are uploaded according to this mapping. I have this (I removed some fields so the mapping doesn't look that long):

import argparse, elasticsearch, json
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
import csv

mapping = 
'''{
"mappings": {
  "type": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "@version": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "authEndStopCode": {
        "type": "keyword"
      },
      "expandedTripNumber": {
        "type": "integer"
      },
      "operator": {
        "type": "integer"
      },
      "path": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "startStopName": {
        "type": "keyword"
      },
      "userStartStopCode": {
        "type": "keyword"
      }
    }
  }
}
}'''

I'm creating the index this way:

es.indices.create(index=INDEX_NAME, ignore=400, body=mapping)

This is what I do to upload the data:

with open(args.file, "r", encoding="latin-1") as f:
    reader = csv.DictReader(f)
    bulk(es, reader, index=INDEX_NAME, doc_type=TYPE)

Where INDEX_NAME and TYPE are strings I already defined.

The CSV file is just data (it should be one document per line), doesn't have headers, but elasticsearch seems like it's trying to use the first line as the headers. I don't want this, I want to use the mapping I already added to the index.

Hope someone can help. Thank you.

druzmieres · February 19, 2018, 4:09pm

The problem wasn't bulk. csv.DictReader always reads the first line from the file to get the headers for subsequent rows. So if you're going to use DictReader, the file needs a header.

system · March 19, 2018, 4:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best way to Index and Map large csv files with Python into Elasticsearch Elasticsearch	2	1946	July 2, 2019
ES ignores mapping when using Python DSL bulk Elasticsearch	2	1277	March 19, 2018
My Mapping with python is not working Elasticsearch	9	898	May 16, 2022
Elastic Search Python mappings Elasticsearch	2	5312	April 13, 2017
Loading CSV Elasticsearch	2	308	April 21, 2018

Loading CSV to elasticsearch index with mapping using Python API

Related topics