I have a text document in which I have a bunch of urls of the form /courses/......./.../..
and from among these urls, I only want to extract those urls that are of the form /courses/.../lecture-notes. Meaning the urls that begin with /courses and ends with /lecture-notes.
Would anyone know of a good way to do this with regular expressions or just by string matching?
Add a comment
|
3 Answers
Here's one alternative:
Scanner s = new Scanner(new FileReader("filename.txt"));
String str;
while (null != (str = s.findWithinHorizon("/courses/\\S*/lecture-notes", 0)))
System.out.println(str);
Given a filename.txt with the content
Here /courses/lorem/lecture-notes and
here /courses/ipsum/dolor/lecture-notes perhaps.
the above snippet prints
/courses/lorem/lecture-notes
/courses/ipsum/dolor/lecture-notes
Comments
The following will only return the middle part (ie: exclude /courses/ and /lectures-notes/:
Pattern p = Pattern.compile("/courses/(.*)/lectures-notes");
Matcher m = p.matcher(yourStrnig);
if(m.find()).
return m.group(1) // The "1" here means it'll return the first part of the regex between parethesis.
Comments
Assuming that you have 1 URL per line, could use:
BufferedReader br = new BufferedReader(new FileReader("urls.txt"));
String urlLine;
while ((urlLine = br.readLine()) != null) {
if (urlLine.matches("/courses/.*/lecture-notes")) {
// use url
}
}