3

I have a dataframe "counts" and I would like to change the name of the second column using a regular expression because I have multiple files with this "extra information", so I have:

| GeneID |  /home/rmachado/Biotec/ARJNA231684/mapa_fin_starterar/SRR1212121_mapped.bamAligned.sortedByCoord.out.bam   |
| -------- | -------------- |
|  Ciclev10010164m.g.v1.0    | 2            |
|  Ciclev10007306m.g.v1.0    | 647            |
|  Ciclev10009318m.g.v1.0   | 39            |
|  Ciclev...   | ...           |
|  Ciclev10007306m.g.v1.0    | 112            |

I tried with the following code with no success:

for col in counts1:
  counts1.rename(columns={col:col.upper().replace("/home/rmachado/Biotec/ARJNA231684/mapa_fin_starterar/SRR1212121_mapped.bamAligned.sortedByCoord.out.bam","SRR[\d]{6}")},inplace=True)

How can I obtain a df with the following format?

| GeneID |  SRR1212121   |
| -------- | -------------- |
|  Ciclev10010164m.g.v1.0    | 2            |
|  Ciclev10007306m.g.v1.0    | 647            |
|  Ciclev10009318m.g.v1.0   | 39            |
|  Ciclev...   | ...           |
|  Ciclev10007306m.g.v1.0    | 112            |
1
  • can you post the data as dictionary Commented Jul 26, 2022 at 15:56

1 Answer 1

3

You could try:

df.columns = df.columns.str.extract(r'((?<=/)SRR\d+|^[^/]+$)', expand=False)

regex:

(?<=/)SRR\d+  # match SDD + digits if preceded by "/"
^[^/]+$       # else match full string if it doesn't contain "/"
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you mozway! That's the answer that I looking for :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.