1

How can I read the file and find all lines match pattern start with \d+\s. And the replace the write space to , . Some of lines are contain English character. But some of line are Chinese. I guest the write space in chinese encoding is different with english?

Example (text.txt)

asdfasdf
1 abcd
2 asdfajklsd
3 asdfasdf
4 ...
asdfasdf
66 ...
aasdfasdf
99 ...
100 中文
101 中文
102 asdfga
103 中文

My Test Code:

with open('text.txt', 'r') as t:
    with open('newtext.txt', 'w') as nt:
            content = t.readlines()

            for line in content:
                    okline = re.compile('^[\d+]\s')
                          if okline:
                             ntext = re.sub('\s', ',', okline)
                             nt.write(ntext)
3
  • You want to replace all spaces with commas? Why not just use str.replace? Commented Jul 10, 2017 at 9:34
  • 1
    don't know why you check for the existence of re.compile object. It always return true, you mean okline = re.match(r'\d+\s', line) ? Commented Jul 10, 2017 at 9:34
  • @COLDSPEED Before replace the spaces, I want to find out all the lines start with digit. As my sample file showing, there are some line are not start with digit. Those lines will not save to newtext.txt Commented Jul 10, 2017 at 9:44

3 Answers 3

1

With single re.subn() function:

with open('text.txt', 'r') as text, open('newtext.txt', 'w') as new_text:
    lines = text.read().splitlines()
    for l in lines:
        rpl = re.subn(r'^(\d+)\s+', '\\1,', l)
        if rpl[1]:
            new_text.write(rpl[0] + '\n')

The main advantage of this is that re.subn will return a tuple (new_string, number_of_subs_made) where number_of_subs_made is the crucial value pointing to the substitution made upon the needed matched line

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, it's work! But it only works on English character. In fact that, there are some lines are contain Chinese character. For lines contain Chinese character, it's fail. Sorry about that, I am a newbie on programming.
0

You could do this:

# Reading lines from input file
with open('text.txt', 'r') as t:
    content = t.readlines()

# Opening file for writing
with open('newtext.txt', 'w') as nt:

    # For each line 
    for line in content:

        # We search for regular expression
        if re.search('^\d+\s', line):

           # If we found pattern inside line only then can continue
           # and substitute white spaces with commas and write to output file
           ntext = re.sub('\s', ',', line)
           nt.write(ntext)

There were multiple problems with your code, for starters \d is character class, basically \d is same as [0-9] so you don't need to put it inside square brackets. You can see regex demo here. Also you were checking if compile object is True, since compile operation succeeds compile object will always be True.

Also, you should avoid nested with statements, more Pythonic way is to open files using with, read it, and then close it.

3 Comments

Thank you for your answer. I don't know why it output a 0 byte newtext.txt
Np. :) Can you accept my answer? meta.stackexchange.com/questions/23138/…
@Enoch there was typo, did you copy content = t.readlines()?
0

Compact code

import re

with open('esempio.txt', 'r') as original, open('newtext2.txt', 'w') as newtext:
    for l in original.read().split('\n'):
        if re.search("^\d+\s",l):
           newtext.write(re.sub('\s', ',', l)+'\n')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.