0

I have a very special work to do, here is my input

Period End Date         12/30/    12/31/   12/29/    12/28/    12/31/2007
                         2011      2010     2009      2008

You can see this is a wrong input file:

  1. year is on the second line
  2. but last date is correct

So I want to dig out the correct date: 12/31/2011 12/31/2012 12/31/2009 12/31/2008 12/31/2007

Here is what I am trying to do

 input_file = open("input", "r")
 for line in input_file:
   index = line.find("Period End Date", 0)
   if index != -1:
     line = line[index+len("Period End Date"):len(line)] 
     temp_line = " ".join(line.split())
     temp_line.split(" ")

     year_line= input_file.next()
     #remove space, split,append on temp_line[i]

But it doesn't work:

temp_line.split(" ")

returns ['1','2','/', ...] not ['12/31/', '12/30', ...]

What's wrong with that?

7
  • 5
    I don't see much "special" about this, that task is average at best. Commented Mar 10, 2012 at 20:53
  • where you say 12/31/2012 do you mean 12/31/2010 ? Commented Mar 10, 2012 at 20:54
  • I mean special because date is mismatched after html2text, anyway, it's a bug. It's just an example, it doesn't matter 2012 or 2010. The problem is after split(), it returns '1', '2', '/'... how to get '12/31/'? Commented Mar 10, 2012 at 20:58
  • What is the point of this? " ".join(line.split()) you split a line to join it again Commented Mar 10, 2012 at 20:59
  • @campos.ddc: That line alone would even make some sense (replacing any amount of whitespace with just one space), but combined with the following line it would be really pointless, but in fact the next line also doesn't work as expected. Commented Mar 10, 2012 at 21:07

3 Answers 3

2

Let's look at your code:

temp_line = " ".join(line.split())

This replaces multiple whitespace with one single space. So far, so okay. Next line:

temp_line.split(" ")

Now what? Splitting it again at single space? This only reverses the join you've done before. Why didn't you just stick with line.split(), then? Also, you're not assigning the result back to temp_line, so the result is discarded, which is probably the main issue here.

You could use something like that instead:

 with open("input", "rb") as f:
   lines = list(f)
   for date_line, year_line in zip(lines, lines[1:])[::2]:
     parts = date_line.strip().split()
     if ' '.join(parts[0:3]) != 'Period End Date': continue

     dates, years = parts[3:], year_line.strip().split()[1:]
     year_index = 0
     for date in dates:
       if not date.split('/')[-1]:
         date = date + years[year_index]
         year_index += 1
       print date
Sign up to request clarification or add additional context in comments.

Comments

1

I am going to presume that the number of dates varies, but always consists of N day-month entries, followed by a complete day-month-year entry, followed by N year entries:

def getHeadings(s):
    head = s.split()
    num_dates = (len(head) - 4)/2
    return [dm+y for dm,y in zip(head[3:3+num_dates], head[4+num_dates:])] + head[3+num_dates:4+num_dates]

getHeadings("""    Period End Date 12/30/ 12/31/ 12/29/ 12/28/ 12/31/2007

                        2011      2010     2009      2008""")

returns

['12/30/2011', '12/31/2010', '12/29/2009', '12/28/2008', '12/31/2007']

Comments

0

it works

>>> temp_line = " ".join(line.split())
>>> temp_line
'12/30/ 12/31/ 12/29/ 12/28/ 12/31/2007'
>>> temp_line.split(" ")
['12/30/', '12/31/', '12/29/', '12/28/', '12/31/2007']

if you are iterating over each item in temp_line then you will get '1','2','/'... etc.

Also, may I suggest some pythonic adjustments to your code? use line.split('Period End Date ')[1].strip().split(" ") instead of

line = line[index+len("Period End Date"):len(line)] 
temp_line = " ".join(line.split())
temp_line.split(" ")

Plus file is an iterator in python you can simply do -

with open(...) as f:
    for line in f:
        <do something with line>

The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered IO

9 Comments

Yes you are right, I do iterate each item and get 1 , 2, /... but how can I get '12/30/' from temp_line?
simple. try - for string in temp_line: <process string>
I think line.split('Period End Date ')[1].strip().split() would be the correct equivalent to the original code.
@Srikar: Not quite. It's gotta be split() whithout an argument to do the same thing as the original code.
@Srikar, line.split('Period End Date ')[1].strip().split() is right but the problem is this input file contains other format lines, I'll have to read the next line getting year info when one line matches "Period End Date" at the beginning. If not, I'll just read one line. That's why I implement my codes in the C/C++ way. Do you have any suggestions in pythonic way? thanks
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.