1

I am trying to parse the table from this website. I started with just the Username column and with the help I got on stackoverflow, I was able to get the content of Username with the following code:

with open("Top 50 TikTok users sorted by Followers - Socialblade TikTok Stats _ TikTok Statistics.html", "r", encoding="utf-8") as file:
    soup = BeautifulSoup(str(file.readlines()), "html.parser")

tiktok = []
for tag in soup.select("div div:nth-of-type(n+5) > div > a"):
    tiktok.append(tag.text)

which gives me

['addison rae',
 'Bella Poarch',
 'Zach King',
 'TikTok',
 'Spencer X',
 'Will Smith',
 'Loren Gray',
 'dixie',
 'Michael Le',
 'Jason Derulo',
 'Riyaz',
.
.
.

My ultimate goal is to populate the entire table with [Rank, Grade, Username, Uploads, Followers, Following, Likes]

I have read a few articles on Parsing HTML Tables in Python with BeautifulSoup and pandas but it didn’t work since this is not defined as a table in the source. What are some of the alternatives to get this as a table in Python?

1 Answer 1

1

You can use this code how to load the HTML from file to soup and then parse the table into dataframe:

import pandas as pd
from bs4 import BeautifulSoup

soup = BeautifulSoup(open("page.html", "r").read(), "html.parser")

data = []
for div in soup.select('div[style*="fafafa"], div[style*="f8f8f8"]'):
    data.append(
        [
            d.get_text(strip=True)
            for d in div.find_all("div", recursive=False)[:8]
        ]
    )


df = pd.DataFrame(
    data,
    columns=[
        "Rank",
        "Grade",
        "Username",
        "Uploads",
        "Followers",
        "Following",
        "Likes",
        "Interactions",
    ],
)
print(df)
df.to_csv("data.csv", index=False)

Prints:

    Rank Grade           Username Uploads    Followers Following          Likes Interactions
0    1st   A++    charli d’amelio   1,755  113,600,000     1,210  9,200,000,000           --
1    2nd   A++        addison rae   1,411   79,900,000     2,454  5,100,000,000           --
2    3rd   A++       Bella Poarch     282   63,600,000       588  1,400,000,000           --
3    4th   A++          Zach King     277   58,800,000        41    723,400,000           --
4    5th   A++             TikTok     139   52,900,000       495    250,300,000           91
5    6th   A++          Spencer X   1,250   52,700,000     7,206  1,300,000,000           --
6    7th   A++         Will Smith      73   52,500,000        23    314,400,000           --
7    8th   A++         Loren Gray   2,805   52,100,000       221  2,800,000,000           --
8    9th   A++              dixie     120   51,200,000     1,267  2,900,000,000           --
9   10th   A++         Michael Le   1,158   47,400,000        93  1,300,000,000           --
10  11th    A+       Jason Derulo     675   44,900,000        12  1,000,000,000           --
11  12th    A+              Riyaz   2,056   44,100,000        43  2,100,000,000           --
12  13th    A+  Kimberly Loaiza ✨   1,150   41,000,000       123  2,200,000,000           --
13  14th    A+       Brent Rivera     955   37,800,000       272  1,200,000,000           --
14  15th    A+           cznburak   1,301   37,300,000         1    688,700,000           --
15  16th    A+           The Rock      42   36,200,000         1    200,300,000           --
16  17th    A+      James Charles     238   36,200,000       148    881,400,000           --
17  18th    A+          BabyAriel   2,365   35,300,000       326  1,900,000,000           --
18  19th    A+          JoJo Siwa   1,206   33,500,000       346  1,100,000,000           --
19  20th    A+              avani   5,347   33,300,000     5,003  2,400,000,000           --
20  21st    A+          GIL CROES     693   32,900,000       454    803,200,000           --
21  22nd    A+      Faisal shaikh     461   32,200,000        --  2,000,000,000           --
22  23rd    A+                BTS      39   32,000,000        --    557,100,000          255
23  24th    A+           LILHUDDY   4,187   30,500,000     8,652  1,600,000,000           --
24  25th    A+       Stokes Twins     548   30,100,000        21    781,000,000           --
25  26th    A+                Joe   1,487   29,800,000     8,402  1,200,000,000           --
26  27th    A+               ROD🥴   1,792   29,500,000       536  1,700,000,000           --
27  28th    A+            𝙳𝚘𝚖𝚒𝚗𝚒𝚔     899   29,400,000       216  1,700,000,000           --
28  29th    A+       Kylie Jenner      69   29,400,000        14    318,800,000           --
29  30th    A+         Junya/じゅんや   2,823   29,000,000     1,934    533,800,000       12,200
30  31st    A+                 YZ     816   28,900,000       563    554,700,000           --
31  32nd    A+      Arishfa Khan🦁   2,026   28,600,000        27  1,100,000,000           --
32  33rd    A+   Lucas and Marcus   1,248   28,500,000       158    806,500,000           --
33  34th    A+    jannat_zubair29   1,054   28,200,000         6    746,300,000           47
34  35th    A+     Nisha Guragain   1,751   28,000,000        33    756,300,000           --
35  36th    A+       Selena Gomez      40   27,800,000        17     82,300,000           --
36  37th    A+            Kris HC   1,049   27,800,000     1,405  1,200,000,000           --
37  38th    A+        flighthouse   4,200   27,600,000       488  2,300,000,000           --
38  39th    A+         wigofellas   1,251   27,500,000       812    707,200,000           --
39  40th    A+   Savannah LaBrant   1,860   27,300,000       155  1,400,000,000           --
40  41st    A+          noah beck   1,395   26,900,000     2,297  1,700,000,000           --
41  42nd    A+         Liza Koshy     155   26,700,000       104    321,900,000           --
42  43rd    A+   Kirya Kolesnikov   1,338   26,400,000        78    543,200,000           --
43  44th    A+        Awez Darbar   2,708   26,100,000       208  1,100,000,000           --
44  45th    A+       Carlos Feria   2,522   25,700,000       138  1,200,000,000           --
45  46th    A+       Kira Kosarin     837   25,700,000       401    447,000,000           --
46  47th    A+     Naim Darrechi🏆   2,634   25,300,000       527  2,200,000,000           --
47  48th    A+      Josh Richards   1,899   24,900,000     9,847  1,600,000,000           --
48  49th    A+             Q Park     231   24,800,000         3    294,100,000           --
49  50th    A+       TikTok_India     186   24,500,000       191     40,100,000           --

And saves data.csv (screenshot from LibreOffice):

enter image description here


EDIT: To get URL username:

import pandas as pd
from bs4 import BeautifulSoup

soup = BeautifulSoup(open("page.html", "r").read(), "html.parser")

data = []
for div in soup.select('div[style*="fafafa"], div[style*="f8f8f8"]'):

    data.append(
        [
            d.get_text(strip=True)
            for d in div.find_all("div", recursive=False)[:8]
        ]
        + [div.a["href"].split("/")[-1]]
    )


df = pd.DataFrame(
    data,
    columns=[
        "Rank",
        "Grade",
        "Username",
        "Uploads",
        "Followers",
        "Following",
        "Likes",
        "Interactions",
        "URL username",
    ],
)

print(df)
df.to_csv("data.csv", index=False)
Sign up to request clarification or add additional context in comments.

2 Comments

Kesley, for Username, is there a way to get the content after last / in href? For example, addisonre instead of addison rae from <a href="/tiktok/user/addisonre"> addison rae </a>
@NazaninZinouri See my edit (The "URL username" column in dataframe)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.