0

I want to export to a new txt file a list of URLs from another txt file. The first txt file looks like this :

http://pastebin.com/raw/10hvUbTi Emails: 631 Keywords: 0.0

http://pastebin.com/raw/5f0bnCq9 Emails: 61 Keywords: 0.0

I am trying to create a list that will look like this:

URL

URL

I am not get anything as an output in pycharm

Can someone help please?

import re
import urllib2
filename = 'C:\\file.txt'
pattern = ('^\S*')
with open(filename) as f:
    for line in f:
        if pattern in line:
            print line
8
  • 1
    Show us the example input and the expected output (real examples). Commented Jul 2, 2016 at 6:28
  • pastebin.com/raw/10hvUbTi Emails: 631 Keywords: 0.0 pastebin.com/raw/c42wEasR Emails: 283 Hashes: 142 . i got nothing in output, it doesnt Commented Jul 2, 2016 at 6:34
  • Please edit the question and add the examples there. We would need example of the input (few lines) and expected output. And also on which lines it fails. Commented Jul 2, 2016 at 6:34
  • it doesnt fail :/. i just get : "Process finished with exit code 0" Commented Jul 2, 2016 at 6:39
  • 1
    Did you post a link to a list of user accounts and their passwords here? Commented Jul 2, 2016 at 7:46

2 Answers 2

1

You could go for:

import re

rx = re.compile(r'^(?P<email>[^|\n]+)', re.MULTILINE)
with open("emails.txt") as f:
    raw_data = f.read()
    emails = [match.group('email') for match in rx.finditer(raw_data)]
    print emails

Obviously, emails.txt needs to be adjusted here.
See a demo on regex101.com.

Sign up to request clarification or add additional context in comments.

Comments

0

You did not use regular expression at all. You merely tested whether the raw string is in the line or not. To use regex,

pattern = re.compile(r'^\S*')

notice the r before pattern string there, it stands for raw string and is very important in regex.

To search for a pattern in a particular line, use

pattern.search(line)

It will return a MatchObject is a match is found, or None if nothing is found. More reference on python regular expression can be found in documentation.

1 Comment

The pattern is meaningless unless captured. "zero or more non-whitespace" will match every line.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.