0

Not sure if this is possible. With two tables, one is country codes:

e.g.

id | code | country

1    .us    United States
2    .ru    Russia

And so on (about 200+ rows)

The other is URLs:

http//:example.gov.us
http://example.gov.ru/index.php
http://xyz.gov.us/test.html

And so on.

I don't know what URLs will come in, so I would have to grab each country code and somehow query the URLs for any matches against the country codes and count how many there are for each.

e.g (?)

gov.[country code]

Ideally, I would like the output to be grouped by country name and counted, something like, using the above URLs as an example, it might result in:

country | total

United States | 2
Russia  | 1

Like I said, not sure if this can be done in MySQL with regex, substrings etc. Would love to know if it can be.

2 Answers 2

1

You could use a query like this:

SELECT
  c.country,
  COUNT(*)
FROM
  countries c INNER JOIN URLS u
  ON SUBSTRING_INDEX(SUBSTRING_INDEX(url, 'http://', -1), '/', 1)
     LIKE CONCAT('%', c.code)
GROUP BY
  c.country

Please see fiddle here.

Using SUBSTRING_INDEX(url, 'http://', -1) you can get the whole string after the http://

http://example.gov.ru/index.php  --->   example.gov.ru/index.php

then using SUBSTRING_INDEX(..., '/', 1) on this string you can get the part of the string before the first / or the whole string if there's no /

example.gov.ru/index.php         --->   example.gov.ru

you can then check if example.gov.ru LIKE '%.ru'

Sign up to request clarification or add additional context in comments.

1 Comment

minor problem perhaps if they have example.gov.fr.ru
0
select country, count(*) total
from country_codes c
join urls on urls.url RLIKE CONCAT("^http://[^/]+\\.gov\\.", c.code, "($|/)")
group by county

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.