2

I just started using python and am trying to convert some of my R code into python. The task is relatively simple; I have many csv file with a variable name (in this case cell lines) and values ( IC50's). I need to pull out all variables and their values shared in common among all files. Some of these files share the save variables but are formatted differently. For example in some files a variable is just "Cell_line" and in others it is MEL:Cell_line. So first things first to make a direct string comparison I need to format them the same and hence am trying ti use str.split() to do so. There is probably a much better way to do this but for now I am using the following code:

import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv" 
with open(file_name) as csvfile:
    NCI_data=csv.reader(csvfile, delimiter=',')
    alldata={}
    for row in NCI_data:
        name_str=row[0]
        splt=name_str.split(':')
        n_name=splt[1]
        alldata[n_name]=row

[1] name_str.split return a list of length 2. Since the portion I want is after the ":" I want the second element which should be indexed as splt[1] as splt[0] is the first in python. However when I run the code I get this error message "IndexError: list index out of range" I'm trying the second element out of a list of length 2 thus I have no idea why it is out of range. Any help or suggestions would be appreciated.

2
  • Are you sure at max there would only be one : in it? Commented Aug 7, 2015 at 1:17
  • 1
    It should work for "MEL:Cell_line"; it will fail on "Cell_line", as it will only have splt[0]. You can use splt[-1] to always get the last element, however many there are. Commented Aug 7, 2015 at 1:19

2 Answers 2

3

I am pretty sure that there are some rows where name_str does not have a : in them. From your own example if the name_str is Cell_line it would fail.

If you are sure that there would only be 1 : in name_str (at max) , or if there are multiple : you want to select the last one, instead of splt[1] , you should use - splt[-1] . -1 index would take the last element in the list (unless its empty) .

Sign up to request clarification or add additional context in comments.

Comments

2

The simple answer is that sometimes the data isn't following the specification being assumed when you write this code (i.e. that there is a colon and two fields).

The easiest way to deal with this is to add an if block if len(splot)==2: and do the subsequent lines within that block.

Optionally, add an else: and print the lines that are not so spec or save them somewhere so you can diagnose.

Like this:

import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv" 
with open(file_name) as csvfile:
    NCI_data=csv.reader(csvfile, delimiter=',')
    alldata={}
    for row in NCI_data:
        name_str=row[0]
        splt=name_str.split(':')
        if len(splt)==2: 
             n_name=splt[1]
             alldata[n_name]=row
        else:
             print "invalid name: "+name_str

Alternatively, you can use try/except, which in this case is a bit more robust because we can handle IndexError anywhere, in either row[0] or in split[1], with the one exception handler, and we don't have to specify that the length of the : split field should be 2.

In addition we could explicitly check that there actually is a : before splitting, and assign the name appropriately.

import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv" 
with open(file_name) as csvfile:
    NCI_data=csv.reader(csvfile, delimiter=',')
    alldata={}
    for row in NCI_data:
        try:
            name_str=row[0]
            if ':' in name_str:
                splt=name_str.split(':')
                n_name=splt[1]
            else:
                n_name = name_str
            alldata[n_name]=row
        except IndexError: 
            print "bad row:"+str(row)

1 Comment

So, thanks I made a stupid mistake and using your guys help realized that the first row was the column headers and thus did not have the ":" format. I assumed that because it already worked in R everything was ok, but forgot that in R you can specify column headers when loading so they are not treated as variables.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.