2
Name Class Species
a     1      3
b     2      4
c     3      2
a     1      3
b     2      1
c     3      2

This above mentioned data will be from CSV file. need to convert this to structured array using numpy. need header from the csv become the columns labels for the array.

Need to print the mean occurrences of each names in each class (the mean of each species for class 1, class 2, and class 3)

I used numpy.genfromtxt().

4
  • What do you mean by you got stuck? What is it doing? Please also post your code, so we can see what you have tried. Commented Mar 1, 2018 at 1:09
  • What's the delimiter? How many columns? On some lines a space between --- 1 and others no space ---2. That could give any reader problems. genfromtxt accepts column numbers as the delimiter parameter. Commented Mar 1, 2018 at 3:04
  • With proper delimiters and headers genfromtxt easily creates a structured from a csv file. Commented Mar 1, 2018 at 3:16
  • @hpaulj. the data is from csv file. it has columns (names, class, numbers) Commented Mar 1, 2018 at 3:43

2 Answers 2

2

This is one way to create a numpy structured array from a csv file:

import pandas as pd

arr = pd.read_csv('file.csv').to_records(index=False)

# rec.array([('a', 1, 3), ('b', 2, 4), ('c', 3, 2), ('a', 1, 3), ('b', 2, 1),
#            ('c', 3, 2)], 
#           dtype=[('Name', 'O'), ('Class', '<i8'), ('Numbers', '<i8')])

You can then work with numpy or (easier) pandas to perform your calculations.

Sign up to request clarification or add additional context in comments.

Comments

0

Using latest numpy (1.14) on Py3.

Your sample, cleaned up:

In [93]: txt = """Name --- Class --- Numbers
    ...: a    ---------- 1    -------- 3
    ...: b    ---------- 2    -------- 4
    ...: c    ---------- 3    -------- 2
    ...: a    ---------- 1    -------- 3
    ...: b    ---------- 2     ------- 1
    ...: c    ---------- 3   --------- 2"""
In [94]: data = np.genfromtxt(txt.splitlines(), dtype=None, names=True, encoding=None)
In [95]: data
Out[95]: 
array([('a', '----------', 1, '--------', 3),
       ('b', '----------', 2, '--------', 4),
       ('c', '----------', 3, '--------', 2),
       ('a', '----------', 1, '--------', 3),
       ('b', '----------', 2, '-------', 1),
       ('c', '----------', 3, '---------', 2)],
      dtype=[('Name', '<U1'), ('f0', '<U10'), ('Class', '<i8'), ('f1', '<U9'), ('Numbers', '<i8')])

Or skipping the dashed columns:

In [96]: data = np.genfromtxt(txt.splitlines(), dtype=None, names=True, encoding=None, usecols=[0,2,4])
In [97]: data
Out[97]: 
array([('a', 1, 3), 
       ('b', 2, 4), 
       ('c', 3, 2), 
       ('a', 1, 3), 
       ('b', 2, 1),
       ('c', 3, 2)],
      dtype=[('Name', '<U1'), ('Class', '<i8'), ('Numbers', '<i8')])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.