2

Apologies for asking a question that have been asked a hundred times before, I'm new to Python and none of the solutions I've found seems to solve my problem.

I have a nested list from a csv file called diabetes, I read in the file and comma separate the elements like this

for line in open("diabetes.csv"):
    lst=line.strip().split(",")
    print(lst)

which prints out the following

['10', '101', '86', '37', '0', '45.6', '1.136', '38', '1']
['2', '108', '62', '32', '56', '25.2', '0.128', '21', '0']
['3', '122', '78', '0', '0', '23', '0.254', '40', '0']

Now my problem is

  1. I need to make a separate list containing only the third element of each list (lst[2])
  2. I need to convert it into floats instead of strings.

I'm using Python 3.6 and I'm pulling my hair out here.

2
  • 3
    Use a CSV reader like pandas or the standard library csv module. Try to avoid reinventing-the-wheel. Commented Sep 9, 2017 at 19:44
  • you can declare an array outside col3 = [] and col3.append(float(lst[2])) to it in every loop. Although I agree using the standard csv module may be better, a custom implementation does provide more flexibility. Commented Sep 9, 2017 at 19:46

4 Answers 4

3

Suppose you have a list of lists of strings:

LoL=[
   ['10', '101', '86', '37', '0', '45.6', '1.136', '38', '1'],
   ['2', '108', '62', '32', '56', '25.2', '0.128', '21', '0'],
   ['3', '122', '78', '0', '0', '23', '0.254', '40', '0'],
]

You can get the nth element of each sublist like so:

>>> [float(sl[2]) for sl in LoL]
[86.0, 62.0, 78.0]

If you have a csv file, use the csv module to do exactly the same thing:

(at the command prompt):

$ cat file.csv
10,101,86,37,0,45.6,1.136,38,1
2,108,62,32,56,25.2,0.128,21,0
3,122,78,0,0,23,0.254,40,0

Python:

import csv
with open('file.csv') as f:
  items=[float(row[2]) for row in csv.reader(f)]

>>> items
[86.0, 62.0, 78.0]

So -- bottom line:

  1. Please use csv or pandas instead of .split(',') so that you can properly handle quoted csv and other particularities;
  2. Use a with context manager so the file is automatically closed at the end of the block;
  3. A csv file is very similar to a list of lists and can usually be handled the same way.
Sign up to request clarification or add additional context in comments.

Comments

0

You can use pandas module, which is pretty standard in data science:

import pandas as pd

df = pd.read_csv("diabetes.csv", header=None, index_col=None)
df.iloc[:, 2] = pd.to_numeric(df.iloc[:, 2], downcast='float')
list = df.iloc[:, 2]

EDIT
Note, that the output type in here will be Series

Comments

0

A very simple and naive one liner:

result = [float(line.strip().split(",")[2]) for line in open("diabetes.csv")]

3 Comments

Naive is the key word. Interpreting CSV data via split is a bad practice. Use the csv library.
No arguing there. Naive approaches can be very useful when learning though, it's good to not be buried in a pile of libraries when all you want to understand is a for loop.
True. It's also helpful to teach a pattern that works in the general case so once learned it works forever. I think the "batteries included" lesson is especially important for Python.
-1

Here's what you can do:

my_list = []

with open("diabetes.csv", 'rb') as csvfile:

    for line in csvfile.readlines():
        lst = line.split(',')
        my_list.append(float(lst[2]))

2 Comments

Your approach is naive and dangerous. Use thecsv library. Interpreting the CSV lines and fields by yourself in this simplistic way will not deal with quotes and other subtleties of CSV syntax.
@ChrisJohnson Is this better ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.