Python find sub string

Question

I have a very special work to do, here is my input

Period End Date         12/30/    12/31/   12/29/    12/28/    12/31/2007
                         2011      2010     2009      2008

You can see this is a wrong input file:

year is on the second line
but last date is correct

So I want to dig out the correct date: 12/31/2011 12/31/2012 12/31/2009 12/31/2008 12/31/2007

Here is what I am trying to do

 input_file = open("input", "r")
 for line in input_file:
   index = line.find("Period End Date", 0)
   if index != -1:
     line = line[index+len("Period End Date"):len(line)] 
     temp_line = " ".join(line.split())
     temp_line.split(" ")

     year_line= input_file.next()
     #remove space, split,append on temp_line[i]

But it doesn't work:

temp_line.split(" ")

returns ['1','2','/', ...] not ['12/31/', '12/30', ...]

What's wrong with that?

I don't see much "special" about this, that task is average at best. — Niklas B.
– Niklas B., Commented Mar 10, 2012 at 20:53
I mean special because date is mismatched after html2text, anyway, it's a bug. It's just an example, it doesn't matter 2012 or 2010. The problem is after split(), it returns '1', '2', '/'... how to get '12/31/'? — Harvey Dent
– Harvey Dent, Commented Mar 10, 2012 at 20:58
What is the point of this? " ".join(line.split()) you split a line to join it again — campos.ddc
– campos.ddc, Commented Mar 10, 2012 at 20:59
@campos.ddc: That line alone would even make some sense (replacing any amount of whitespace with just one space), but combined with the following line it would be really pointless, but in fact the next line also doesn't work as expected. — Niklas B.
– Niklas B., Commented Mar 10, 2012 at 21:07

Niklas B. · Accepted Answer · 2012-03-10 22:12:49Z

2

Let's look at your code:

temp_line = " ".join(line.split())

This replaces multiple whitespace with one single space. So far, so okay. Next line:

temp_line.split(" ")

Now what? Splitting it again at single space? This only reverses the join you've done before. Why didn't you just stick with line.split(), then? Also, you're not assigning the result back to temp_line, so the result is discarded, which is probably the main issue here.

You could use something like that instead:

 with open("input", "rb") as f:
   lines = list(f)
   for date_line, year_line in zip(lines, lines[1:])[::2]:
     parts = date_line.strip().split()
     if ' '.join(parts[0:3]) != 'Period End Date': continue

     dates, years = parts[3:], year_line.strip().split()[1:]
     year_index = 0
     for date in dates:
       if not date.split('/')[-1]:
         date = date + years[year_index]
         year_index += 1
       print date

edited Mar 10, 2012 at 22:12

answered Mar 10, 2012 at 21:13

Niklas B.

95.8k18 gold badges201 silver badges228 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Hugh Bothwell · Accepted Answer · 2012-03-10 21:11:34Z

1

I am going to presume that the number of dates varies, but always consists of N day-month entries, followed by a complete day-month-year entry, followed by N year entries:

def getHeadings(s):
    head = s.split()
    num_dates = (len(head) - 4)/2
    return [dm+y for dm,y in zip(head[3:3+num_dates], head[4+num_dates:])] + head[3+num_dates:4+num_dates]

getHeadings("""    Period End Date 12/30/ 12/31/ 12/29/ 12/28/ 12/31/2007

                        2011      2010     2009      2008""")

returns

['12/30/2011', '12/31/2010', '12/29/2009', '12/28/2008', '12/31/2007']

answered Mar 10, 2012 at 21:11

Hugh Bothwell

57k9 gold badges91 silver badges103 bronze badges

Comments

Srikar Appalaraju · Accepted Answer · 2012-03-10 21:07:28Z

0

it works

>>> temp_line = " ".join(line.split())
>>> temp_line
'12/30/ 12/31/ 12/29/ 12/28/ 12/31/2007'
>>> temp_line.split(" ")
['12/30/', '12/31/', '12/29/', '12/28/', '12/31/2007']

if you are iterating over each item in temp_line then you will get '1','2','/'... etc.

Also, may I suggest some pythonic adjustments to your code? use line.split('Period End Date ')[1].strip().split(" ") instead of

line = line[index+len("Period End Date"):len(line)] 
temp_line = " ".join(line.split())
temp_line.split(" ")

Plus file is an iterator in python you can simply do -

with open(...) as f:
    for line in f:
        <do something with line>

The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered IO

edited Mar 10, 2012 at 21:07

answered Mar 10, 2012 at 21:02

Srikar Appalaraju

74k55 gold badges221 silver badges265 bronze badges

9 Comments

Harvey Dent Over a year ago

Yes you are right, I do iterate each item and get 1 , 2, /... but how can I get '12/30/' from temp_line?

Srikar Appalaraju Over a year ago

simple. try - for string in temp_line: <process string>

Niklas B. Over a year ago

I think line.split('Period End Date ')[1].strip().split() would be the correct equivalent to the original code.

Niklas B. Over a year ago

@Srikar: Not quite. It's gotta be split() whithout an argument to do the same thing as the original code.

Harvey Dent Over a year ago

@Srikar, line.split('Period End Date ')[1].strip().split() is right but the problem is this input file contains other format lines, I'll have to read the next line getting year info when one line matches "Period End Date" at the beginning. If not, I'll just read one line. That's why I implement my codes in the C/C++ way. Do you have any suggestions in pythonic way? thanks

|

Collectives™ on Stack Overflow

Python find sub string

3 Answers 3

Comments

Comments

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related