1

I'm working with pandas for the first time and not sure if what I'm attempting is the best way to accomplish this, but I'm parsing an Excel sheet, trying to make a list of every number in a certain column (column 'C' in my case).

I returned everything from column 'C', removed the empty cells and now I'm trying to remove the '#' character that's in front of the numbers.

Here's my code from before I tried to remove the '#' character, so you can see the output:

def get_prod_cmvp_data():
    prod_cmvp_data = pd.read_excel('_Request ID__Client Cryptographic Module List Template.xlsx', usecols = 'C', header = 6, )
    prod_cmvp = prod_cmvp_data.dropna()
    vals = prod_cmvp.values
    print(vals)

output:

[['#3914']
 [' #3907']
 [' #3197']
 ['#4272']
 ['#4271']
 ['#4254']
 ['#3784']
 ['#3946']
 ['#3888']
 ['#4174']
 ['#4222']
 ['#3613']
 ['#3125']
 ['#3140']
 ['#3197']
 [' #3196']
 [' #3644']
 [' #3615']
 ['#3651']
 ['#3918']
 ['#3946']
 ['#4271']
 ['#3888']
 ['#4174']
 ['#4222']
 ['#3613']
 ['#3125']
 ['#3140']]

Here's my code after I tried to remove the '#' character

def get_prod_cmvp_data():
    prod_cmvp_data = pd.read_excel('_Request ID__Client Cryptographic Module List Template.xlsx', usecols = 'C', header = 6, )
    prod_cmvp = prod_cmvp_data.dropna()
    vals = prod_cmvp.values
    values = vals.str.replace("#", "")
    print(values)

Output:

AttributeError: 'numpy.ndarray' object has no attribute 'str'

Excel File:

Here's a link to the spreadsheet on google sheets, if that's any easier

https://docs.google.com/spreadsheets/d/1yBU4XrfMk54kQorOD7GCQ6kGBYzY_RGf10aHIKy4HLo/edit?usp=sharing

enter image description here

2 Answers 2

1

.values returns a ndarray so won't work with .str.replace.

You should be able to drop

vals = prod_cmvp.values

And use pandas string methods on the pandas series object instead:

prod_cmvp = prod_cmvp_data.dropna()["C"].str.replace("#", "")

Caveats: From what you posted it's not easy to reproduce your input data. The solution above assumes column C holds an array of strings when read with pd. read_excel. Also, it's assumed it's called "C" otherwise replace the string '["C"]' with your correct column name. If you need to see the column names:

prod_cmvp_data.columns
Sign up to request clarification or add additional context in comments.

5 Comments

I added the excel file (hopefully I did that correctly) and made the suggested changes and got this error AttributeError: 'DataFrame' object has no attribute 'str'
Thanks for the screenshot - I think what would be useful is the actual excel file itself. I've edited the answer to suggest the fix.
When I run "print(prod_cmvp_data.columns)" this is the output: Index(['Unnamed: 2'], dtype='object') when I tried the fix I got a Key Error: 'C'
I'm not sure how to add spreadsheets, but I uploaded it to google docs and added the link
The key error means it can't find the column name as mentioned in the text above
0

please provide excel file, it can done elegantly while reading from file. solution to current problem as below.

col=[['#3914'],
[' #3907'],
[' #3197'],
['#4272'],
['#4271'],
['#4254'],
['#3784'],
['#3946'],
['#3888'],
['#4174'],
['#4222'],
['#3613'],
['#3125'],
['#3140'],
['#3197'],
[' #3196'],
[' #3644'],
[' #3615'],
['#3651'],
['#3918'],
['#3946'],
['#4271'],
['#3888'],
['#4174'],
['#4222'],
['#3613'],
['#3125'],
['#3140']]

for i in col:
    i[0]=i[0].replace('#', '')

3 Comments

I just added the excel file, hopefully I did it correctly
@MitchellPrivett- i can only see snap shot of excel file. better to upload the excel file.
I'm not sure how to add spreadsheets, but I uploaded it to google docs and added the link for that

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.