Changing rows at random in pandas

Question

I have a pandas dataframe - and one column has airline names (or company names). I want to generate a 'messy' data set by changing a small subset of the names (which are only in one column) to names that are similar but not the same. So United Airlines would become UNITED AIRLINES say. The following is an example of my data set

Description
0   United Airlines
1   Pinnacle Airlines Inc.
2   Ryanair
3   British Airways

Is there anyway to randomly apply changes of stings by row to a Pandas data frame. Does anyone have any ideas?

EdChum · Accepted Answer · 2014-12-05 11:25:05Z

1

You can use numpy.random.choice to return a random selection of your index, it takes a 1-D array and returns a random selection of the size you pass:

In [177]:

rand_indices = np.random.choice(df.index, 2)
rand_indices.sort()
rand_indices
Out[177]:
array([1, 2], dtype=int64)
In [178]:

df.loc[rand_indices]
Out[178]:
              Description  a
1  Pinnacle Airlines Inc.  1
2                 Ryanair  2
In [179]:

def scramble_text(df, index, col):
    df.loc[index, col] = df[col].str.upper()

scramble_text(df, rand_indices, 'Description')
df
Out[179]:
              Description  a
0         United Airlines  0
1  PINNACLE AIRLINES INC.  1
2                 RYANAIR  2
3         British Airways  3

answered Dec 5, 2014 at 11:25

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Peadar Coyle Over a year ago

Thanks that is exactly what I was after. I need to learn the df.loc function better :)

Collectives™ on Stack Overflow

Changing rows at random in pandas

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related