I am trying to clean a list of url's that has garbage as shown.
- /gradoffice/index.aspx(
- /gradoffice/index.aspx-
- /gradoffice/index.aspxjavascript$
- /gradoffice/index.aspx~
I have a csv file with over 190k records of different url's. I tried to load the csv into a pandas dataframe and took the entire column of url's into a list by using the statement
str = df['csuristem']
it clearly gave me all the values in the column. when i use the following code - It is only printing 40k records and it starts some where in the middle. I don't know where am going wrong. the program runs perfectly but is showing me only partial number of results. any help would be much appreciated.
import pandas
table = pandas.read_csv("SS3.csv", dtype=object)
df = pandas.DataFrame(table)
str = df['csuristem']
for s in str:
s = s.split(".")[0]
print s
I am looking to get an output like this
- /gradoffice/index.
- /gradoffice/index.
- /gradoffice/index.
- /gradoffice/index.
Thank you, Santhosh.