R - Replace row variables within a data frame with variables from another row

Question

I have a list of data frames similar to the reprex below but with 100+ columns:

# reproducible example
df <- data.frame(
  Name = c("Name1", "Name2", "Name3", "Name4", "Name5"),
  Date = c("2018-01-01", "2018-01-02"),
  Value1 = c(rnorm(5, 2, 3), rnorm(5, 4, 1)),
  Value2 = c(rnorm(5, 12, 4), rnorm(5, 5, 8)),
  Value3 = c(rnorm(5, 22, 13), rnorm(5, 7, 10))
)

# transform data frame into list
df <- split(df, df$Name)

For each data frame in the list, I would like to replace the last row with values from the one prior row. For example, for each data frame in the list, I would like to replace [2, 3:5] with [1, 3:5].

> tail(df[["Name1"]], n = 2)
   Name       Date    Value1    Value2    Value3
1 Name1 2018-01-01 0.9184539 15.658510 29.219707
2 Name1 2018-01-02 3.8875463  3.628546  9.777399

I'm not sure if transforming my data frame into a list is the best way to go about this so any other suggestions are welcome. I tried tackling this as outlined below but my attempt only replaces the last row in the data frame with the second to last row.

My Attempt

# reproducible example
df <- data.frame(
  Name = c("Name1", "Name2", "Name3", "Name4", "Name5"),
  Date = c("2018-01-01", "2018-01-02"),
  Value1 = c(rnorm(5, 2, 3), rnorm(5, 4, 1)),
  Value2 = c(rnorm(5, 12, 4), rnorm(5, 5, 8)),
  Value3 = c(rnorm(5, 22, 13), rnorm(5, 7, 10))
)

# arrange by Name and Date
df <- df %>% dplyr::arrange(Name, Date)

# attempt to replace 
df[length(df$Name), c(3:5)] <- df[length(df$Name)-1, c(3:5)]

# result
tail(df, n = 4)

> tail(df, n = 4)
    Name       Date    Value1    Value2    Value3
7  Name4 2018-01-01  3.242383 -11.44217 -1.215688
8  Name4 2018-01-02 -4.042093  18.18184  1.544271
9  Name5 2018-01-01 -1.930195  13.18662 18.889372
10 Name5 2018-01-02 -1.930195  13.18662 18.889372

www · Accepted Answer · 2018-10-11 01:12:09Z

1

A tidyverse solution. I don't think converting to a list is necessary. df is the data frame in your example. We can replace the last row with NA and then use fill to fill with the previous row.

library(tidyverse)

df2 <- df %>%
  group_by(Name) %>%
  mutate_at(vars(starts_with("Value")), 
            funs(ifelse(row_number() == max(row_number()), NA, .))) %>%
  fill(starts_with("Value")) %>%
  ungroup()
df2
# # A tibble: 10 x 5
#    Name  Date       Value1 Value2 Value3
#    <fct> <fct>       <dbl>  <dbl>  <dbl>
#  1 Name1 2018-01-01  1.35   14.5   34.2 
#  2 Name1 2018-01-02  1.35   14.5   34.2 
#  3 Name2 2018-01-02  2.42    4.43  19.5 
#  4 Name2 2018-01-01  2.42    4.43  19.5 
#  5 Name3 2018-01-01  4.60   14.1   15.8 
#  6 Name3 2018-01-02  4.60   14.1   15.8 
#  7 Name4 2018-01-02  6.36   11.4    9.40
#  8 Name4 2018-01-01  6.36   11.4    9.40
#  9 Name5 2018-01-01  0.214   8.34  33.8 
# 10 Name5 2018-01-02  0.214   8.34  33.8

The following could be even better. This one does not use the fill function, and it does not change the row order as well.

df2 <- df %>%
  group_by(Name) %>%
  mutate_at(vars(starts_with("Value")), 
            funs(ifelse(row_number() == max(row_number()), 
                        nth(., n = max(row_number()) - 1),
                        .))) %>%
  ungroup()
df2
# # A tibble: 10 x 5
#    Name  Date       Value1 Value2 Value3
#    <fct> <fct>       <dbl>  <dbl>  <dbl>
#  1 Name1 2018-01-01   4.40  13.5   28.0 
#  2 Name2 2018-01-02   1.82   8.23  20.9 
#  3 Name3 2018-01-01   1.07  16.9    7.50
#  4 Name4 2018-01-02   1.09   8.05  14.4 
#  5 Name5 2018-01-01   1.17  11.6   24.0 
#  6 Name1 2018-01-02   4.40  13.5   28.0 
#  7 Name2 2018-01-01   1.82   8.23  20.9 
#  8 Name3 2018-01-02   1.07  16.9    7.50
#  9 Name4 2018-01-01   1.09   8.05  14.4 
# 10 Name5 2018-01-02   1.17  11.6   24.0

edited Oct 11, 2018 at 1:12

answered Oct 11, 2018 at 0:59

www

39.3k12 gold badges52 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

On_an_island Over a year ago

very nice solutions. Both work well with the reprex data and in my actual data frame when I define one column name in the starts_with() function. However, defining more than one column name like so starts_with(c("Value", "OtherValue", "OtherOtherValue")) produces the following error: Error in starts_with(c("Value", "OtherValue", "OtherOtherValue")) : is_string(match) is not TRUE.

www Over a year ago

@On_an_island Try vars(starts_with("Value"), starts_with("OtherValue"), starts_with("OtherOtherValue"))

On_an_island Over a year ago

Found that exact suggestion not too long after I posted my response to you. Thank you, your second solution works great and is MUCH faster than using fill!

Collectives™ on Stack Overflow

R - Replace row variables within a data frame with variables from another row

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related