You need to open the file in the correct codec; UTF-8 is not the correct one. The dataset doesn't specify it, but I have determined that the most likely codec is mac_roman:
with open ('/Users/Weindependent/Desktop/dataset/albumlist.csv', 'r', encoding='mac_roman') as case0:
The original Kaggle dataset doesn't bother to document it, and the various kernels that use the set all just clobber the encoding. It's clearly a 8-bit Latin-variant (the majority of the data is ASCII with a few individual 8-bit codepoints).
So I analysed the data, and found there are just two such codepoints in 9 rows:
>>> import re
>>> eightbit = re.compile(rb'[\x80-\xff]')
>>> with open('albumlist.csv', 'rb') as bindata:
... nonascii = [l for l in bindata if eightbit.search(l)]
...
>>> len(nonascii)
9
>>> {c for l in nonascii for c in eightbit.findall(l)}
{b'\x89', b'\xca'}
The 0x89 byte appears in just one line:
>>> sum(l.count(b'\x89') for l in nonascii)
1
>>> sum(l.count(b'\xca') for l in nonascii)
22
>>> next(l for l in nonascii if b'\x89' in l)
b'359,1972,Honky Ch\x89teau,Elton John,Rock,"Pop Rock,\xcaClassic Rock"\r\n'
That's clearly Elton John's 1972 Honky Château album, so the 0x89 byte must represent the U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX codepoint.
The 0xCA bytes all appear to represent an alternative space character, they all appear righ after commas in the genre and subgenre columns (with one album exception):
>>> import csv
>>> for row in csv.reader((l.decode('ascii', 'backslashreplace') for l in nonascii)):
... for col in row:
... if '\\' in col: print(col)
...
Reggae,\xcaPop,\xcaFolk, World, & Country,\xcaStage & Screen
Reggae,\xcaRoots Reggae,\xcaRocksteady,\xcaContemporary,\xcaSoundtrack
Electronic,\xcaStage & Screen
Soundtrack,\xcaDisco
Rock,\xcaBlues
Blues Rock,\xcaElectric Blues,\xcaHarmonica Blues
Garage Rock,\xcaPsychedelic Rock
Honky Ch\x89teau
Pop Rock,\xcaClassic Rock
Funk / Soul,\xcaFolk, World, & Country
Rock,\xcaPop
Stan Getz\xca/\xcaJoao Gilberto\xcafeaturing\xcaAntonio Carlos Jobim
Bossa Nova,\xcaLatin Jazz
Lo-Fi,\xcaIndie Rock
These 0xCA bytes are almost certainly representing the U+00A0 NO-BREAK SPACE codepoint.
With these two mappings, you can try to determine what 8-bit codecs would make the same mapping. Rather than manually try out all Python's codecs I used Tripleee's 8-bit codec mapping to see what codecs use these mappings. There are only two:
â (U+00E2): mac_arabic, mac_croatian, mac_farsi, mac_greek, mac_iceland, mac_roman, mac_romanian, mac_turkish
0xca
(U+00A0): mac_centeuro, mac_croatian, mac_cyrillic, mac_greek, mac_iceland, mac_latin2, mac_roman, mac_romanian, mac_turkish
There are 6 encodings that are listed in both sets:
>>> set1 = set('mac_arabic, mac_croatian, mac_farsi, mac_greek, mac_iceland, mac_roman, mac_romanian, mac_turkish'.split(', '))
>>> set2 = set('mac_centeuro, mac_croatian, mac_cyrillic, mac_greek, mac_iceland, mac_latin2, mac_roman, mac_romanian, mac_turkish'.split(', '))
>>> set1 & set2
{'mac_turkish', 'mac_iceland', 'mac_romanian', 'mac_greek', 'mac_croatian', 'mac_roman'}
Of these, Mac OS Roman mac_roman codec is probably the most likely to have been used as Microsoft Excel for Mac used Mac Roman to create CSV files for a long time. However, it doesn't really matter, any of those 6 would work here.
You may want to replace those U+00A0 non-breaking spaces if you want to split out the genre and subgenre columns (really the genre and style columns if these were taken from Discogs).
encoding="....". What encoding to use, we can't tell you, sorry.cp1252. Give it a try first.