Python 2.7 Search Line if match pattern and replace string

Question

How can I read the file and find all lines match pattern start with \d+\s. And the replace the write space to , . Some of lines are contain English character. But some of line are Chinese. I guest the write space in chinese encoding is different with english?

Example (text.txt)

asdfasdf
1 abcd
2 asdfajklsd
3 asdfasdf
4 ...
asdfasdf
66 ...
aasdfasdf
99 ...
100 中文
101 中文
102 asdfga
103 中文

My Test Code:

with open('text.txt', 'r') as t:
    with open('newtext.txt', 'w') as nt:
            content = t.readlines()

            for line in content:
                    okline = re.compile('^[\d+]\s')
                          if okline:
                             ntext = re.sub('\s', ',', okline)
                             nt.write(ntext)

You want to replace all spaces with commas? Why not just use str.replace? — cs95
– cs95, Commented Jul 10, 2017 at 9:34
don't know why you check for the existence of re.compile object. It always return true, you mean okline = re.match(r'\d+\s', line) ? — Avinash Raj
– Avinash Raj, Commented Jul 10, 2017 at 9:34
@COLDSPEED Before replace the spaces, I want to find out all the lines start with digit. As my sample file showing, there are some line are not start with digit. Those lines will not save to newtext.txt — Enoch
– Enoch, Commented Jul 10, 2017 at 9:44

RomanPerekhrest · Accepted Answer · 2017-07-10 10:03:57Z

1

With single re.subn() function:

with open('text.txt', 'r') as text, open('newtext.txt', 'w') as new_text:
    lines = text.read().splitlines()
    for l in lines:
        rpl = re.subn(r'^(\d+)\s+', '\\1,', l)
        if rpl[1]:
            new_text.write(rpl[0] + '\n')

The main advantage of this is that re.subn will return a tuple (new_string, number_of_subs_made) where number_of_subs_made is the crucial value pointing to the substitution made upon the needed matched line

edited Jul 10, 2017 at 10:03

answered Jul 10, 2017 at 9:58

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Enoch Over a year ago

Yes, it's work! But it only works on English character. In fact that, there are some lines are contain Chinese character. For lines contain Chinese character, it's fail. Sorry about that, I am a newbie on programming.

Aleksandar Makragić · Accepted Answer · 2017-07-10 09:49:16Z

0

You could do this:

# Reading lines from input file
with open('text.txt', 'r') as t:
    content = t.readlines()

# Opening file for writing
with open('newtext.txt', 'w') as nt:

    # For each line 
    for line in content:

        # We search for regular expression
        if re.search('^\d+\s', line):

           # If we found pattern inside line only then can continue
           # and substitute white spaces with commas and write to output file
           ntext = re.sub('\s', ',', line)
           nt.write(ntext)

There were multiple problems with your code, for starters \d is character class, basically \d is same as [0-9] so you don't need to put it inside square brackets. You can see regex demo here. Also you were checking if compile object is True, since compile operation succeeds compile object will always be True.

Also, you should avoid nested with statements, more Pythonic way is to open files using with, read it, and then close it.

edited Jul 10, 2017 at 9:49

answered Jul 10, 2017 at 9:42

Aleksandar Makragić

1,99719 silver badges34 bronze badges

3 Comments

Enoch Over a year ago

Thank you for your answer. I don't know why it output a 0 byte newtext.txt

Aleksandar Makragić Over a year ago

Np. :) Can you accept my answer? meta.stackexchange.com/questions/23138/…

Aleksandar Makragić Over a year ago

@Enoch there was typo, did you copy content = t.readlines()?

PythonProgrammi · Accepted Answer · 2017-07-10 16:59:53Z

0

Compact code

import re

with open('esempio.txt', 'r') as original, open('newtext2.txt', 'w') as newtext:
    for l in original.read().split('\n'):
        if re.search("^\d+\s",l):
           newtext.write(re.sub('\s', ',', l)+'\n')

answered Jul 10, 2017 at 16:59

PythonProgrammi

23.6k3 gold badges44 silver badges35 bronze badges

Collectives™ on Stack Overflow

Python 2.7 Search Line if match pattern and replace string

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related