0

I am using Python 2.7.3 btw

Hi all,

I have a little problem. The problem is that I keep running into trouble with the starred line below. (Sorry, kinda new to Python)

So here is my code so far:

with open('parsedChr','w') as fout, open('heartLungClassU.gtf','r') as filein:

    average = 0
    difference = 0
    position = ''
    bp = 0

    for line in filein:
      **chrom,cuff,exon,start,end,dot,sign,dots,gene,tranid,exonid,rest = line.split('\t',11)**
      ## notice 12 variables here so I tried to unpack with value 11

    ##more code after

I keep getting this error:

Traceback (most recent call last):
  File "parse.py", line 11, in <module>
    chrom,cuff,exon,start,end,dot,sign,dots,gene,tranid,exonid,rest = line.split('\t',11)
ValueError: need more than 9 values to unpack

I don't understand why though -- note that there are 12 variables I am splitting the line into. Why would python complain about needing more than 9 values to unpack? I've had code before where I had to split into 6 variables and so used 5 in line.split (5 cuts into 6 pieces, as I understood it), but I don't understand why similar logic doesn't work here.

EDIT: here is a portion of the file:

chr1    Cufflinks   exon    14765607    14765689    .   +   .   gene_id "XLOC_000018";  transcript_id   "TCONS_00001260";   exon_number "1";    oId "CUFF.68.1";    class_code  "u";    tss_id  "TSS40";
chr1    Cufflinks   exon    14766604    14767199    .   +   .   gene_id "XLOC_000018";  transcript_id   "TCONS_00001260";   exon_number "2";    oId "CUFF.68.1";    class_code  "u";    tss_id  "TSS40";
chr1    Cufflinks   exon    21156530    21156632    .   +   .   gene_id "XLOC_000028";  transcript_id   "TCONS_00002433";   exon_number "1";    oId "CUFF.88.1";    class_code  "u";    tss_id  "TSS69";

EDIT: Meh. Figured it out. Thanks for the help everyone.

10
  • 2
    little problem and such huge code is posted. Commented Oct 17, 2012 at 19:58
  • Can you show us the file. How it looks like? Commented Oct 17, 2012 at 19:58
  • 1
    I would guess that one of your rows is missing a couple values -- consider a,b,c = 'foo bar'.split(' ',2) Commented Oct 17, 2012 at 19:59
  • You can also remove that blah blah blah. Its not part of the code right? Commented Oct 17, 2012 at 20:01
  • @RohitJain here is the link dl.dropbox.com/u/108419362/file.gtf Commented Oct 17, 2012 at 20:05

3 Answers 3

4

To see what the exact line number where the error is do this:

for i, line in enumerate(filein):
    try:
        chrom,cuff,exon,start,end,dot,sign,dots,gene,tranid,exonid,rest = line.split('\t',11)
    except ValueError:
        print "ValueError on line", i+1
        print "line", repr(line)
        raise

In your comment you provided a link to your text file. I don't find any line with less than 11 tabs:

>>> for i, line in enumerate(urllib.urlopen('http://dl.dropbox.com/u/108419362/file.gtf')):
...     if line.count('\t') < 11:
...         print i+1, repr(line)
...         break
...
>>>

Double check that you are really opening the file you think you are opening.

Sign up to request clarification or add additional context in comments.

4 Comments

@Joe. Use the second version. It will tell you the line number and print the repr of the line, which shows the tabs as \t.
Now it returns... nothing. Haha. I'm not sure what's going on, but I guess I'll try to figure it out.
@Joe: Double check that you are really opening the file you think you are opening.
No, definitely opening the right file. It worked after I used sed to replace all ' ' with \t... I guess I missed something somewhere. Thanks for the help!
2

It means that your line does not contain enough (in this case at least 9) tabulator characters so that split() call would fill all the variables with the splitted values.

The following code will produce the same error:

s = 'a b'
x, y, z = s.split(' ') # the result is ('a', 'b') but we have 3 variables
                       # on the left side of the expression.

2 Comments

Thanks for the answer! I'm trying to figure out which line is causing this... even though as far as I can tell I can't find any lines that don't have 9+ tabs in them...
You're welcome! Give a try to the code with try..except as @Steven Rumbaiski explained.
2

it means the line splits into only 9 values:

example:

>>> a,b,c='foo bar'.split()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: need more than 2 values to unpack

you can add an if condition to handle this:

if len(line.split('\t'))>=11:

6 Comments

Almost identical to my comment above. Striking.
I understand what you guys are saying -- but I combed through the file and I can't find any lines where that is the case... weird.
@Joe Why not just post a portion of your file?
you should post the file content in the question then.
Just edited the post so it contains a small portion of the file.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.