2

I have a pandas dataframe - and one column has airline names (or company names). I want to generate a 'messy' data set by changing a small subset of the names (which are only in one column) to names that are similar but not the same. So United Airlines would become UNITED AIRLINES say. The following is an example of my data set

Description
0   United Airlines
1   Pinnacle Airlines Inc.
2   Ryanair
3   British Airways

Is there anyway to randomly apply changes of stings by row to a Pandas data frame. Does anyone have any ideas?

1 Answer 1

1

You can use numpy.random.choice to return a random selection of your index, it takes a 1-D array and returns a random selection of the size you pass:

In [177]:

rand_indices = np.random.choice(df.index, 2)
rand_indices.sort()
rand_indices
Out[177]:
array([1, 2], dtype=int64)
In [178]:

df.loc[rand_indices]
Out[178]:
              Description  a
1  Pinnacle Airlines Inc.  1
2                 Ryanair  2
In [179]:

def scramble_text(df, index, col):
    df.loc[index, col] = df[col].str.upper()

scramble_text(df, rand_indices, 'Description')
df
Out[179]:
              Description  a
0         United Airlines  0
1  PINNACLE AIRLINES INC.  1
2                 RYANAIR  2
3         British Airways  3
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks that is exactly what I was after. I need to learn the df.loc function better :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.