Extracting specific urls from a text file using java

Question

I have a text document in which I have a bunch of urls of the form /courses/......./.../.. and from among these urls, I only want to extract those urls that are of the form /courses/.../lecture-notes. Meaning the urls that begin with /courses and ends with /lecture-notes. Would anyone know of a good way to do this with regular expressions or just by string matching?

aioobe · Accepted Answer · 2012-08-11 19:48:42Z

5

Here's one alternative:

Scanner s = new Scanner(new FileReader("filename.txt"));

String str;
while (null != (str = s.findWithinHorizon("/courses/\\S*/lecture-notes", 0)))
    System.out.println(str);

Given a filename.txt with the content

Here /courses/lorem/lecture-notes and
here /courses/ipsum/dolor/lecture-notes perhaps.

the above snippet prints

/courses/lorem/lecture-notes
/courses/ipsum/dolor/lecture-notes

edited Aug 11, 2012 at 19:48

answered Aug 11, 2012 at 19:37

aioobe

423k115 gold badges831 silver badges844 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

WhyNotHugo · Accepted Answer · 2012-08-11 19:48:49Z

1

The following will only return the middle part (ie: exclude /courses/ and /lectures-notes/:

Pattern p = Pattern.compile("/courses/(.*)/lectures-notes");
Matcher m = p.matcher(yourStrnig);

if(m.find()).
  return m.group(1) // The "1" here means it'll return the first part of the regex between parethesis.

answered Aug 11, 2012 at 19:48

WhyNotHugo

9,9636 gold badges67 silver badges74 bronze badges

Comments

Reimeus · Accepted Answer · 2012-08-11 20:55:08Z

1

Assuming that you have 1 URL per line, could use:

    BufferedReader br = new BufferedReader(new FileReader("urls.txt"));
    String urlLine;
    while ((urlLine = br.readLine()) != null) {
        if (urlLine.matches("/courses/.*/lecture-notes")) {
            // use url
        }
    }

edited Aug 11, 2012 at 20:55

answered Aug 11, 2012 at 19:42

Reimeus

160k16 gold badges225 silver badges282 bronze badges

2 Comments

Reimeus Over a year ago

Nothing in the description precludes processing the urls. This check is within a loop.

aioobe Over a year ago

Unless you explain how to traverse a text token by token (or at least line by line) this answer is not complete. (Also, ^ and $ are not needed when using matches.)

Collectives™ on Stack Overflow

Extracting specific urls from a text file using java

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related