0

I need to change a csv file in order to generate a random string value to each row:

This is my code by now:

patients = pd.read_csv("patients.csv")

# updating the column value/data
patients['VALOR_ID'] = patients['VALOR_ID'].str.replace('^(\w+|)',generate_cip())
  
# writing into the file
patients.to_csv("patients-writer.csv", index=False)

The problem here is that all rows end up having same value.

Any ideas about how to apply generate_cip for each row?

Note:

I need to use replace since row value has this format:

GMCP0200611068|46977549A|81132941070

and I need to change only right before | string part.

For example:

NIF,CIP,FORMAT_ID,VALOR_ID
39999384T,MAMO28374657001,CIP|NASS,XXXXX|2343434|81132941070
39576383R,CACO56874934005,CIP|NASS,XXXXX|39283744|81132941070

My desired output is that XXXXX = generated_cip()

39999384T,MAMO28374657001,CIP|NASS,generated_cip()|2343434|81132941070
39576383R,CACO56874934005,CIP|NASS,generated_cip()|39283744|81132941070

Any ideas?

5
  • Does generate_cip() generates a random value? Commented Nov 12, 2021 at 10:41
  • Yes, it does that. Commented Nov 12, 2021 at 10:41
  • It will be nice to have some data in order to test the solutions Commented Nov 12, 2021 at 10:43
  • generate_cip() is a built-in function, or just you defined it? Commented Nov 12, 2021 at 10:43
  • I've added some details on question post... Commented Nov 12, 2021 at 10:47

2 Answers 2

2

Try:

import pandas as pd


# toy function
def generate_cip():
    import random
    from string import ascii_uppercase
    return "XXX" + "".join(random.sample(ascii_uppercase, 2))


# toy data
patients = pd.DataFrame.from_dict(
    {'NIF': {0: '39999384T', 1: '39576383R'}, 'CIP': {0: 'MAMO28374657001', 1: 'CACO56874934005'},
     'FORMAT_ID': {0: 'CIP|NASS', 1: 'CIP|NASS'},
     'VALOR_ID': {0: 'XXXXX|2343434|81132941070', 1: 'XXXXX|39283744|81132941070'}})

patients['VALOR_ID'] = patients['VALOR_ID'].str.replace('^(\w+|)', lambda x: generate_cip(), regex=True)
print(patients)

Output

         NIF              CIP FORMAT_ID                    VALOR_ID
0  39999384T  MAMO28374657001  CIP|NASS   XXXVH|2343434|81132941070
1  39576383R  CACO56874934005  CIP|NASS  XXXKB|39283744|81132941070

The repl argument of Series.str.replace can be a callable, from the documentation:

repl str or callable Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub().

Sign up to request clarification or add additional context in comments.

Comments

1

I think you want to replace the 'XXXXX' with the respective CIP s. You can try this solution:

import pandas as pd

# toy data
patients = pd.DataFrame.from_dict(
    {'NIF': {0: '39999384T', 1: '39576383R'}, 'CIP': {0: 'MAMO28374657001', 1: 'CACO56874934005'},
     'FORMAT_ID': {0: 'CIP|NASS', 1: 'CIP|NASS'},
     'VALOR_ID': {0: 'XXXXX|2343434|81132941070', 1: 'XXXXX|39283744|81132941070'}})

#Try: 

for index,row in patients.iterrows():
    cip = row['CIP']
    row['VALOR_ID'] = row['VALOR_ID'].replace('XXXXX',cip)

Output:

         NIF              CIP FORMAT_ID                              VALOR_ID
0  39999384T  MAMO28374657001  CIP|NASS   MAMO28374657001|2343434|81132941070
1  39576383R  CACO56874934005  CIP|NASS  CACO56874934005|39283744|81132941070

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.