CSV data to Numpy structured array?

Question

Name Class Species
a     1      3
b     2      4
c     3      2
a     1      3
b     2      1
c     3      2

This above mentioned data will be from CSV file. need to convert this to structured array using numpy. need header from the csv become the columns labels for the array.

Need to print the mean occurrences of each names in each class (the mean of each species for class 1, class 2, and class 3)

I used numpy.genfromtxt().

What do you mean by you got stuck? What is it doing? Please also post your code, so we can see what you have tried. — DMe
– DMe, Commented Mar 1, 2018 at 1:09
What's the delimiter? How many columns? On some lines a space between --- 1 and others no space ---2. That could give any reader problems. genfromtxt accepts column numbers as the delimiter parameter. — hpaulj
– hpaulj, Commented Mar 1, 2018 at 3:04
With proper delimiters and headers genfromtxt easily creates a structured from a csv file. — hpaulj
– hpaulj, Commented Mar 1, 2018 at 3:16
@hpaulj. the data is from csv file. it has columns (names, class, numbers) — Cullen DuYaw
– Cullen DuYaw, Commented Mar 1, 2018 at 3:43

jpp · Accepted Answer · 2018-03-01 01:10:55Z

2

This is one way to create a numpy structured array from a csv file:

import pandas as pd

arr = pd.read_csv('file.csv').to_records(index=False)

# rec.array([('a', 1, 3), ('b', 2, 4), ('c', 3, 2), ('a', 1, 3), ('b', 2, 1),
#            ('c', 3, 2)], 
#           dtype=[('Name', 'O'), ('Class', '<i8'), ('Numbers', '<i8')])

You can then work with numpy or (easier) pandas to perform your calculations.

answered Mar 1, 2018 at 1:10

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hpaulj · Accepted Answer · 2018-03-01 03:30:24Z

Using latest numpy (1.14) on Py3.

Your sample, cleaned up:

In [93]: txt = """Name --- Class --- Numbers
    ...: a    ---------- 1    -------- 3
    ...: b    ---------- 2    -------- 4
    ...: c    ---------- 3    -------- 2
    ...: a    ---------- 1    -------- 3
    ...: b    ---------- 2     ------- 1
    ...: c    ---------- 3   --------- 2"""
In [94]: data = np.genfromtxt(txt.splitlines(), dtype=None, names=True, encoding=None)
In [95]: data
Out[95]: 
array([('a', '----------', 1, '--------', 3),
       ('b', '----------', 2, '--------', 4),
       ('c', '----------', 3, '--------', 2),
       ('a', '----------', 1, '--------', 3),
       ('b', '----------', 2, '-------', 1),
       ('c', '----------', 3, '---------', 2)],
      dtype=[('Name', '<U1'), ('f0', '<U10'), ('Class', '<i8'), ('f1', '<U9'), ('Numbers', '<i8')])

Or skipping the dashed columns:

In [96]: data = np.genfromtxt(txt.splitlines(), dtype=None, names=True, encoding=None, usecols=[0,2,4])
In [97]: data
Out[97]: 
array([('a', 1, 3), 
       ('b', 2, 4), 
       ('c', 3, 2), 
       ('a', 1, 3), 
       ('b', 2, 1),
       ('c', 3, 2)],
      dtype=[('Name', '<U1'), ('Class', '<i8'), ('Numbers', '<i8')])

Collectives™ on Stack Overflow

CSV data to Numpy structured array?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related