0

I came up with the below which finds a string in a row and copies that row to a new file. I want to replace Foo23 with something more dynamic (i.e. [0-9], etc.), but I cannot get this, or variables or regex, to work. It doesn't fail, but I also get no results. Help? Thanks.

with open('C:/path/to/file/input.csv') as f:
    with open('C:/path/to/file/output.csv', "w") as f1:
        for line in f:
            if "Foo23" in line:
                f1.write(line)
5
  • 1
    What is this dynamic thing you're trying to write? As the code stands, you're testing if the line has "Foo23" and then printing it if it's there. All other lines will be ignored. Commented Jul 24, 2018 at 18:21
  • @malan under regex, what I would like to search is for any line containing r"[A-Za-z][A-Za-z][A-Za-z]\d\d" Commented Jul 24, 2018 at 18:26
  • ...but I cannot get this, or variables or regex, to work. - please read minimal reproducible example. Show us what you are trying to do that doesn't work - include an example input string(s). Commented Jul 24, 2018 at 18:26
  • @physlexic: You don't know what module and commands to use to search for r"[A-Za-z][A-Za-z][A-Za-z]\d\d" Try import re and then test if re.search(r"[A-Za-z][A-Za-z][A-Za-z]\d\d", line): Commented Jul 24, 2018 at 18:29
  • @malan sorry maybe I should have explained it better, but by or regex in my initial question, I meant that I did try import re etc. etc. Commented Jul 24, 2018 at 19:10

2 Answers 2

1

Based on your comment, you want to match lines whenever any three letters followed by two numbers are present, e.g. foo12 and bar54. Use regex!

import re
pattern = r'([a-zA-Z]{3}\d{2})\b'
for line in f:
    if re.findall(pattern, line):
        f1.write(line)

This will match lines like 'some line foo12' and 'another foo54 line', but not 'a third line foo' or 'something bar123'.

Breaking it down:

pattern = r'(                  # start capture group, not needed here, but nice if you want the actual match back
             [a-zA-Z]{3}       # any three letters in a row, any case
                        \d{2}  # any two digits
            )                  # end capture group
            \b                 # any word break (white space or end of line)
           '

If all you really need is to write all of the matches in the file to f1, you can use:

matches = re.findall(pattern, f.read())  # finds all matches in f
f1.write('\n'.join(matches))  # writes each match to a new line in f1
Sign up to request clarification or add additional context in comments.

7 Comments

You don't need an f-string for this pattern.
\w{3} will match much more than just 3 letters
Why do you have the variable my_string, it isn't being used
@DillonDavis from a previous version of the answer that I changed. Removed it now, thanks.
@physlexic see my update. If you just want to grab all of the matches from the file, it's even easier!
|
1

In essence, your question boils down to: "I want to determine whether the string matches pattern X, and if so, output it to the file". The best way to accomplish this is to use a reg-ex. In Python, the standard reg-ex library is re. So,

import re
matches = re.findall(r'([a-zA-Z]{3}\d{2})', line)

Combining this with file IO operations, we have:

data = []
with open('C:/path/to/file/input.csv', 'r') as f:
     data = list(f)

data = [ x for x in data if re.findall(r'([a-zA-Z]{3}\d{2})\b', line) ]
with open('C:/path/to/file/output.csv', 'w') as f1:
    for line in data:
        f1.write(line)

Notice that I split up your file IO operations to reduce nesting. I also removed the filtering outside of your IO. In general, each portion of your code should do "one thing" for ease of testing and maintenance.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.