3

I am trying to write a function in R which lumps species columns together within a data.frame.

(To elaborate a bit on what I'm doing...I have a data frame with multiple plant species for multiple sites and years. Some of the species were misidentified, so I'd like to group to a more general level (e.g. spp a and spp b were mixed up throughout the years; so I'd like to create a new column called spp.ab in which the data for spp a and b are lumped together)).

Example:

spp.a spp.b
  1     0
  2     3
  0     4
  3     2
  4     5

I'd like to eventually end up with a single column that displays the maximum from value from the two species:

spp.ab
  1
  3
  4
  3
  5

I've started writing a function which does this; however, I'm having troubling adding the new column to my data set and dropping the old ones. Could someone tell me what's wrong with my code?

lump <- function(db, spp.list, new.spp) { #input spp.list as c('spp.a', 'spp.b', ...)
  mini.db <- subset(db, select=spp.list);
  newcol <- as.vector(apply(mini.db, 1, max, na.rm=T));
  db$new.spp <- newcol
  db <- db[,names(db) %in% spp.list]
  return(db)
}

When I call the function as such

test <- lump(db, c('spp.a', 'spp.b'), spp.ab)
test

all that pops up is the mini.db. Am I missing something with return()?

For reference, db is the database, spp.list is the species I want to lump together, and new.spp is what I would like the new column named.

Thanks for any help,
Paul

2 Answers 2

3

I've figured it out...stupid mistake, of course. Here is the code that works:

lump <- function(db, spp.list, new.spp) { #input spp.list as a c('spp.a', 'spp.b', ...), and new.spp must be in quotes (e.g. 'new.spp')
    mini.db <- subset(db, select=spp.list);
    newcol <- as.vector(apply(mini.db, 1, max, na.rm=T));
    newcol[newcol==-Inf] <- NA;
    db[new.spp] <- newcol;
    db <- db[, !names(db) %in% spp.list];
    return(as.data.frame(db));
 }

The key is in the db[new.spp] <- newcol; line. Apparently using this works, but using db$new.spp <- newcol does not. I also then added a ! to the line db <- db[,!names(db) %in% spp.list]. This was my biggest mistake.

Sign up to request clarification or add additional context in comments.

2 Comments

So new.spp is the name of the new column?
Correct, and you can call it whatever you like by placing the name in quotes as an argument to the function. For example, if you want to call it el.conquistador you would input lump(db, c('spp.a', 'spp.b'), 'el.conquistador')
2

While it seems like you've found your answer, I would suggest, instead, the pmax function:

> with(db, pmax(spp.a, spp.b))
[1] 1 3 4 3 5

You can use this with within or transform to mimic your function:

out <- within(db, spp.ab <- pmax(spp.a, spp.b))
out
#   spp.a spp.b spp.ab
# 1     1     0      1
# 2     2     3      3
# 3     0     4      4
# 4     3     2      3
# 5     4     5      5

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.