Loading CSV to elasticsearch index with mapping using Python API

Question

Using the elasticsearch Python API I want to create an elasticsearch index with a mapping so that when I upload a CSV file the documents are uploaded according to this mapping.

import argparse, elasticsearch, json
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
import csv

I have this (I removed some fields so the mapping doesn't look that long):

mapping = 
'''{
"mappings": {
  "type": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "@version": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "authEndStopCode": {
        "type": "keyword"
      },
      "expandedTripNumber": {
        "type": "integer"
      },
      "operator": {
        "type": "integer"
      },
      "path": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "startStopName": {
        "type": "keyword"
      },
      "userStartStopCode": {
        "type": "keyword"
      }
    }
  }
}
}'''

I'm creating the index this way:

es.indices.create(index=INDEX_NAME, ignore=400, body=mapping)

This is what I do to upload the data:

with open(args.file, "r", encoding="latin-1") as f:
    reader = csv.DictReader(f)
    bulk(es, reader, index=INDEX_NAME, doc_type=TYPE)

Where INDEX_NAME and TYPE are strings I already defined.

The CSV file is just data (it should be one document per line), doesn't have headers, but elasticsearch seems like it's trying to use the first line as the headers. I don't want this, I want to use the mapping I already added to the index.

Hope someone can help. Thank you.

6659081 · Accepted Answer · 2018-02-19 16:14:18Z

1

The problem wasn't bulk. csv.DictReader always reads the first line from the file to get the headers for subsequent rows. So if you're going to use DictReader, the file needs a header.

answered Feb 19, 2018 at 16:14

6659081

4018 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Lupanoide Over a year ago

it's not only a problem of csv.DictReader but all the dictionaries and json objects that needs keys and values. If you use these structures, you have to specify the keys of the values

StatguyUser Over a year ago

so how do you fix it? Can you specify correct code?

6659081 Over a year ago

@Enthusiast, you can add a header to the file using from subprocess import call call ["sed", "-i", '1i ' + "header", path/to/file] where "header" is the line you want to add

MosheZada · Accepted Answer · 2019-01-25 11:01:47Z

0

I'm the author of moshe/elasticsearch_loader
I wrote ESL for this exact problem.
You can download it with pip:

pip install elasticsearch-loader

And then you will be able to load csv files into elasticsearch while supplying your custom mapping by issuing:

elasticsearch_loader  --index-settings-file mappings.json \
     --index incidents --type incident csv file1.csv

answered Jan 25, 2019 at 11:01

MosheZada

2,4291 gold badge17 silver badges17 bronze badges

Collectives™ on Stack Overflow

Loading CSV to elasticsearch index with mapping using Python API

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related