1

I'm trying add Google URL Builder's functionality into my application.

https://support.google.com/analytics/answer/1033867?hl=en

Unfortunately, I'm not sure to get the exact results..

My code

        def buildurl(url):

            #take out old url builder
            url = sub('\?utm_source=.*?(&|$)utm_medium=.*?(&|$)|utm_term=.*?(&|$)|utm_content=.*?(&|$)|utm_campaign=.*?(&|$)','',url)

            #build url
            header += '?utm_source=' + self.data['source']
            header += '&utm_medium=' + self.data['medium']
            header += '&utm_campaign=' + self.data['campaign']

            #return long url
            return(url header + urllib.quote(header)

My code returns this: http://iipdigital.usembassy.gov/st/english/article/2014/08/20140813305633.html#axzz3ANwb5XD?utm_source=source&utm_medium=medi&utm_campaign=testu

Google's URL Builder Returns this: http://iipdigital.usembassy.gov/st/english/article/2014/08/20140813305633.html?utm_source=source&utm_medium=medi&utm_campaign=test#axzz3ANwb5XDu

I could push the #axzz3ANwb5XDu to the back, but is there a way to parse and reconstruct the url in a standardized way?

3 Answers 3

1

You should checkout the urlparse module. I have modified your code such that it removes the existing url builder parts but keeps any other parts of the query.

from urlparse import urlparse, urlunparse

def buildurl(url):

    #take out old url builder.
    url = sub('utm_source=.*?(&|$)utm_medium=.*?(&|$)|utm_term=.*?(&|$)|utm_content=.*?(&|$)|utm_campaign=.*?(&|$)','',url)


    #Parse the url.        
    o = urlparse(url)

    #build url query.
    query = o.query
    query += 'utm_source=' + self.data['source']
    query += '&utm_medium=' + self.data['medium']
    query += '&utm_campaign=' + self.data['campaign']

    #return the url with the corrected query.
    return urlunparse(o.scheme, o.netloc, o.path, o.params, query, o.fragment)

The fragment should be at the end of the url.

Sign up to request clarification or add additional context in comments.

Comments

1

I would go for Pythons urllib - it's a build in library.

import urllib.parse

getVars = {'var1': 'some_data', 'var2': 1337}
url = 'http://domain.com/somepage/?'

print(url + urllib.parse.urlencode(getVars))

Output:

http://domain.com/somepage/?var2=1337&var1=some_data

Comments

0

There is a way to parse the URL; it's called urlparse:

try:
    from urllib.parse import urlparse, urlunparse
except ImportError:  # Python 2.x
    from urlparse import urlparse, urlunparse


def buildurl(url):
    scheme, netloc, path, params, query, fragment = urlparse(url)

    #take out old url builder
    query = sub('\?utm_source=.*?(&|$)utm_medium=.*?(&|$)|utm_term=.*?(&|$)|utm_content=.*?(&|$)|utm_campaign=.*?(&|$)', '', query)

    #build url
    query += '?utm_source=' + self.data['source']
    query += '&utm_medium=' + self.data['medium']
    query += '&utm_campaign=' + self.data['campaign']
    return urlunparse((scheme, netloc, path, params, query, fragment))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.