I have the following code from this question Df groupby set comparison:
import pandas as pd
wordlist = pd.read_csv('data/example.txt', sep='\r', header=None, index_col=None, names=['word'])
wordlist = wordlist.drop_duplicates(keep='first')
# wordlist['word'] = wordlist['word'].astype(str)
wordlist['split'] = ''
wordlist['anagrams'] = ''
for index, row in wordlist.iterrows() :
row['split'] = list(row['word'])
anaglist = wordlist['anagrams'] = wordlist['word'].apply(lambda x: ''.join(sorted(list(x))))
wordlist['anagrams'] = anaglist
wordlist = wordlist.drop(['split'], axis=1)
wordlist = wordlist['anagrams'].drop_duplicates(keep='first')
print(wordlist)
print(wordlist.dtypes)
Some input in my example.txt file seems to be being read as ints, particularly if the strings are of different character lengths. I can't seem to force pandas to see the data as strings using .astype(str)
What's going on?