2

I am trying to create a column using data from another column based on part of the information of that column, For example I have a list of devices:

 devicename        make     devicevalue
 switch1           cisco        0
 switch1-web100    netgear      0  
 switch10          cisco        0
 switch23          cisco        1
 switch31-web200   netgear      0
 switch31          cisco        1
 switch40          cisco        1

The column needs have 2 variables:

  • If make == netgear (set to 0)
  • If devicename end in 20 or greater (set to 1, otherwise set to 0)
  • Alternatively, instead of also looking at the make the devicename could be filtered by web too.

I am using Pandas to open CSV file, make the edits (for some other columns) then save it, but I am having difficulty with this bit.

This is where I have gotten to, but I know it doesn't work but I've got a bit lost and I'm quite new to Python:

import pandas as pd

df = pd.read_csv('data.csv')
df['devicevalue'] = df.devicename
    if 'netgear' in df.name df.set_value '0'
    if str.endswith > 20 df.set_value '0'
    else if df.set_value '1'

2 Answers 2

2

Try the following:

import pandas as pd

df = pd.DataFrame(columns=['devicename', 'make'])
df.loc[0] = ['switch1', 'cisco']
df.loc[1] = ['switch1-web100', 'netgear']
df.loc[2] = ['switch10', 'cisco']
df.loc[3] = ['switch23', 'cisco']
df.loc[4] = ['switch31-web200', 'netgear']
df.loc[5] = ['switch31', 'cisco']
df.loc[6] = ['switch40', 'cisco']


def get_number_suffix(devicename: str) -> int:
    """
    This function looks at the last several characters, and extracts
    the last n contiguous digits, and returns as an integer.
    :param devicename:
    :return:
    """
    i = 1
    while i < len(devicename) and devicename[-i:].isnumeric():
        i += 1

    return int(devicename[-(i-1):])


def compute_devicevalue(row) -> int:
    """
    This function computes the devicevalue based on the criteria:
    If make = netgear (set to 0)
    If devicename end in 20 or greater (set to 1, otherwise set to 0)
    :param row:
    :return:
    """

    if 'netgear' in row['make']:
        return 0
    if 20 <= get_number_suffix(row['devicename']):
        return 1
    else:
        return 0


df['devicevalue'] = df.apply(compute_devicevalue, axis=1)
print(df.head(7))
Sign up to request clarification or add additional context in comments.

3 Comments

thank you for the reply, I'll see if i can make this work, do I need to import each line item in the top portion? As I have about 9000 records that I have to export from another tool, then process and add this new column in
No, no. I only did that for the example. You should definitely use 'read_csv' as in your question.
This worked perfectly thank you, I had to add another if statement to remove a model variant as well, but I was able to add it easily now that I could see how to create it.
1

Looks like you are having trouble working through the data. Here is how I would approach this. Use a for loop to cycle through the different elements in each column, then use logic on each of those as you cycle through. This question previously found a great way to split the letters from the numbers; they explained it better than I can.

Make   = ['cisco', 'netgear', 'cisco', 'cisco', 'netgear']
Number = ['switch1', 'switch1-web100', 'switch10', 'switch23', 'switch31-web200']
newcol = []
for i, j in enumerate(Make):
    if i == 'netgear':
        if int(re.split('(\d+)',Number[j])[-1]) > 20:
            newcol.append(1)
        else:
            newcol.append(0)
    else:
        newcol.append(0)

3 Comments

A broadcasting approach would probably scale better.
You are right, that would be faster. Good thinking!
Thanks for your reply/help even though I didn't use this method

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.