When I scrape web site for articles urls and get all <a> tags and get all href attributes, this list of urls has some links not for articles but links to other categories or any other pages within same domain so I need to do the following :
create a pattern for the url and match each url in the links list to this pattern so I can know is this url is article url or not
the pattern example is like:
link: "http://www.cnbc.com/2016/03/13/financial-times-china-rebuts-economy-doomsayers-on-debt-and.html"
pattern match: http://www.cnbc.com/(*)/(*)/(*)/(*).html
so the idea that replace any variable part of the link with (*)
the question is how to match link to pattern?
[^/]+instead of*, and escape the dot.(*)sections are numbers, so you can use[0-9]+. The last(*)section is a combination of letters and symbols, so you can use.+.