3

Assuming a single subdomain, how do I replace everything in the URL before the domain and any trailing slashes?

Example strings:
https://www.google.com/
http://net.tutsplus.com/about

The result I want (from my example strings) is:
google.com
tutsplus.com/about

Currently, the regex I'm using is:
^https?:\/\/'

Which results in:
www.google.com/
net.tutsplus.com/about

This replaces everything up to the slashes in the URL, but I want to replace everything up to the first .

My current code in Apps Script is:

var body = DocumentApp.getActiveDocument().getBody();
body.replaceText('^https?:\/\/', '');

Given that I'm using Google Apps Script, it could be an issue with how replaceText() works. Thanks in advance for the help.

11
  • I would be surprised if there is not a JavaScript library for doing this. Have you looked into this? Commented Feb 5, 2016 at 1:44
  • 2
    Try ^https?:\/\/.*?\.to match everything up to and including the first .. Commented Feb 5, 2016 at 1:48
  • @sideroxylon That results in: ww.google.com/ Commented Feb 5, 2016 at 1:53
  • @CBroe that's not at all constructive, and there's no reason for the hostility. I didn't include an exhaustive list of what I've tried for fear of cluttering the question. I've tried ^https?:\/\/.*\.$ and a number of variations. Commented Feb 5, 2016 at 1:56
  • @TimBiegeleisen A plain regex should be able to get me there. I don't want to import a library into Google Apps Script, partly because it's clunky and partly because it shouldn't be necessary. Commented Feb 5, 2016 at 1:58

2 Answers 2

1

It looks like Google Doc's regex implementation is weak. It doesn't support capturing group, so you will run into problems with the following:

  • http://hoffmaninstitute.co.uk
  • http://google.com
  • http://docs.aws.amazon.com/

Assume that the text is always http://+one_sub_domain+domain+tld, you can use:

  var body = DocumentApp.getActiveDocument().getBody();
  body.replaceText('^https?://[0-9A-Za-z_]+\.', '');
Sign up to request clarification or add additional context in comments.

1 Comment

thanks. This doesn't actually work for the links in the example strings. Thanks for trying, though.
0

From Apps Script's .replaceText() docs:

Replaces all occurrences of a given text pattern with a given replacement string, using regular expressions.
A subset of the JavaScript regular expression features are not fully supported, such as capture groups and mode modifiers.

It will only accept strings as arguments. Implementing my own regex search and replace is unnecessarily complex because it necessitates converting each object type to be the appropriate Apps Script object before actually issuing a replacement.

I failed to note that subdomains should only be replaced if they're www due to some unforeseen link string formats that require a subdomain to be readable. For reference, here's a more thorough set of link formats:

https://www.google.com/
https://www.google.com
https://google.com/
https://google.com
http://www.google.com/
http://www.google.com
http://google.com
https://product.google.com/about/
https://product.google.com/about
https://product.google.com/
https://product.google.com
http://product.google.com/about/
http://product.google.com/about
http://product.google.com/
http://product.google.com

While the following is inefficient and verbose, it works:

function replaceLongUrls(element) {
    element = element || DocumentApp.getActiveDocument().getBody();

    element.replaceText('^https?:\/\/', '');
    element.replaceText('^www.', '');
    element.replaceText('/$', '');
};

Sources:
Apps Script Documentation
Google Apps Script Regex exec() returning null
replaceText() RegEx "not followed by"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.