1

I have a dataset which has a variable which has urls as its observations. I am trying to create another variable which would list the type of domain for the observation in the "url" variable (.com, .org, .co.uk etc.)

I could split the "url" variable by parsing using "."

split url, p(.)

but that would not definitively give me the domain name.

The problem arises due to the high variance in the type of "url"s For. eg.

  • while www.google.com would be split into 3 variables, http://www.nih.nlm.gov would be split into 4
  • similarly while www.yahoo.com is split into 3, https://www.movies.yahoo.co.au would be split into 5.

How can i write the following formula in stata to create the "domain type" variable from the "url" variable

  • if the part after the last "." in the "url" variable has ≥ 3 characters (.com/.edu/.org/.gov or .info) then use this as domain type

    • if the part after the last "." in the "url" variable has < 3 characters ( .uk/.au/.tv etc.) AND the part before the last "." has ≤ 2 characters (.co ), then use the part after the penultimate "." as domain type (i.e. .co.uk)

      • if the value after the last "." in the "url" variable has < 3 characters ( .us domains) AND the part before the last "." has > 2 characters, then use the part after the last "." as domain type (e.g freeshootinggames.us)

Also, is there another way of doing this ?

I am working in Stata 13.1 on Windows 8 Pro x64

Thanks !!

1 Answer 1

2

Reversing strings is a useful trick in problems like this. Try something like this:

gen rev_url = reverse(url)
split rev_url, parse(.) gen(domain_)
replace domain_1 = reverse(domain_1)
replace domain_2 = reverse(domain_2)
replace domain_1 = domain_2 + "." + domain_1 if length(domain_2)<=2 & length(domain1)<3
rename domain_1 domain
drop domain_* rev_url
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.