Python pandas: Change value for each row with random string

Question

I need to change a csv file in order to generate a random string value to each row:

This is my code by now:

patients = pd.read_csv("patients.csv")

# updating the column value/data
patients['VALOR_ID'] = patients['VALOR_ID'].str.replace('^(\w+|)',generate_cip())
  
# writing into the file
patients.to_csv("patients-writer.csv", index=False)

The problem here is that all rows end up having same value.

Any ideas about how to apply generate_cip for each row?

Note:

I need to use replace since row value has this format:

GMCP0200611068|46977549A|81132941070

and I need to change only right before | string part.

For example:

NIF,CIP,FORMAT_ID,VALOR_ID
39999384T,MAMO28374657001,CIP|NASS,XXXXX|2343434|81132941070
39576383R,CACO56874934005,CIP|NASS,XXXXX|39283744|81132941070

My desired output is that XXXXX = generated_cip()

39999384T,MAMO28374657001,CIP|NASS,generated_cip()|2343434|81132941070
39576383R,CACO56874934005,CIP|NASS,generated_cip()|39283744|81132941070

Any ideas?

It will be nice to have some data in order to test the solutions — Dani Mesejo
– Dani Mesejo, Commented Nov 12, 2021 at 10:43
generate_cip() is a built-in function, or just you defined it? — AziMez
– AziMez, Commented Nov 12, 2021 at 10:43

Dani Mesejo · Accepted Answer · 2021-11-12 10:49:25Z

Try:

import pandas as pd


# toy function
def generate_cip():
    import random
    from string import ascii_uppercase
    return "XXX" + "".join(random.sample(ascii_uppercase, 2))


# toy data
patients = pd.DataFrame.from_dict(
    {'NIF': {0: '39999384T', 1: '39576383R'}, 'CIP': {0: 'MAMO28374657001', 1: 'CACO56874934005'},
     'FORMAT_ID': {0: 'CIP|NASS', 1: 'CIP|NASS'},
     'VALOR_ID': {0: 'XXXXX|2343434|81132941070', 1: 'XXXXX|39283744|81132941070'}})

patients['VALOR_ID'] = patients['VALOR_ID'].str.replace('^(\w+|)', lambda x: generate_cip(), regex=True)
print(patients)

Output

         NIF              CIP FORMAT_ID                    VALOR_ID
0  39999384T  MAMO28374657001  CIP|NASS   XXXVH|2343434|81132941070
1  39576383R  CACO56874934005  CIP|NASS  XXXKB|39283744|81132941070

The repl argument of Series.str.replace can be a callable, from the documentation:

repl str or callable Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub().

sayantan ghosh · Accepted Answer · 2021-11-12 11:47:12Z

I think you want to replace the 'XXXXX' with the respective CIP s. You can try this solution:

import pandas as pd

# toy data
patients = pd.DataFrame.from_dict(
    {'NIF': {0: '39999384T', 1: '39576383R'}, 'CIP': {0: 'MAMO28374657001', 1: 'CACO56874934005'},
     'FORMAT_ID': {0: 'CIP|NASS', 1: 'CIP|NASS'},
     'VALOR_ID': {0: 'XXXXX|2343434|81132941070', 1: 'XXXXX|39283744|81132941070'}})

#Try: 

for index,row in patients.iterrows():
    cip = row['CIP']
    row['VALOR_ID'] = row['VALOR_ID'].replace('XXXXX',cip)

Output:

         NIF              CIP FORMAT_ID                              VALOR_ID
0  39999384T  MAMO28374657001  CIP|NASS   MAMO28374657001|2343434|81132941070
1  39576383R  CACO56874934005  CIP|NASS  CACO56874934005|39283744|81132941070

Collectives™ on Stack Overflow

Python pandas: Change value for each row with random string

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related