1

I'm pretty new to R, and programming in general, and I'm wondering the best way to loop through a column so I can add a column to the data frame further describing the observations I looped through.

I currently have a list of amino acids and their positions on a protein that looks like this:

Residue Position
H   1
R   2
K   3
D   4
E   5
H   6
R   7
K   8
D   9
E   10

I'd like something that looks like this (where H, R, and K are basic amino acids, and D and E are acidic amino acids):

Residue Position    Properties
H   1   Basic
R   2   Basic
K   3   Basic
D   4   Acidic
E   5   Acidic
H   6   Basic
R   7   Basic
K   8   Basic
D   9   Acidic
E   10  Acidic

I'm really not sure where to start, and I'm having difficulty finding a good resource for this kind of situation in R.

I started by trying to subset the data, but then I realized that wouldn't do the trick:

Basic
h.dat <- subset(all, all$Residue == "H")
r.dat <- subset(all, all$Residue == "R")
k.dat <- subset(all, all$Residue == "K")

Acidic
d.dat <- subset(all, all$Residue == "D")
e.dat <- subset(all, all$Residue == "E")

Thanks!

Note: 
H = Histidine (Basic amino acid)
R = Arginine (Basic)
K = Lysine (Basic)

E = Glutamic Acid (Acidic)
D = Aspartic Acid (Acidic)
0

3 Answers 3

6

You can use ifelse. If df is the name of your original data,

df$Property <- ifelse(df$Residue %in% c("H", "R", "K"), "Basic", "Acidic")
df
#    Residue Position Property
# 1        H        1    Basic
# 2        R        2    Basic
# 3        K        3    Basic
# 4        D        4   Acidic
# 5        E        5   Acidic
# 6        H        6    Basic
# 7        R        7    Basic
# 8        K        8    Basic
# 9        D        9   Acidic
# 10       E       10   Acidic
Sign up to request clarification or add additional context in comments.

1 Comment

+1 Another option would be within(df, Property <- c("Acidic", "Basic")[(Residue %in% c("H", "R", "K")) +1])
2

Try:

> df1
   Residue Position
1        H        1
2        R        2
3        K        3
4        D        4
5        E        5
6        H        6
7        R        7
8        K        8
9        D        9
10       E       10

Create a reference table:

> df2
  Residue Property
1       H    Basic
2       R    Basic
3       K    Basic
4       D   Acidic
5       E   Acidic

Then merge:

> merge(df1, df2)
   Residue Position Property
1        D        9   Acidic
2        D        4   Acidic
3        E        5   Acidic
4        E       10   Acidic
5        H        1    Basic
6        H        6    Basic
7        K        8    Basic
8        K        3    Basic
9        R        7    Basic
10       R        2    Basic

Comments

2

I think you might want to allow for non-polar amino acids as well:

c(rep("Basic",3),rep("Acidic",2),"Non-Polar")[   # those are the choices
        match(dat$Residue, c("H","R","K","E","D"), nomatch=6) ] #select indices

So I added an 11th residue named "Z" and tested:

> dat$Property <- c(rep("Basic",3),rep("Acidic",2),"Non-Polar")[
                 match(dat$Residue, c("H","R","K","E","D"), nomatch=6) ]
> dat
   Residue Position  Property
1        H        1     Basic
2        R        2     Basic
3        K        3     Basic
4        D        4    Acidic
5        E        5    Acidic
6        H        6     Basic
7        R        7     Basic
8        K        8     Basic
9        D        9    Acidic
10       E       10    Acidic
11       Z       11 Non-Polar

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.