1

I'm trying to select all yearly values with their corresponding month beyond years 2011, and if the rows do not match in size, then filling them in with NAs.

I have tried:

rank_data %>% filter(across(starts_with('year'))>2011)

Which only returns the output for a single column:

  jan year   feb year2   mar year3   apr year4   may year5   jun year6   jul year7   aug year8   sep year9   oct year10   nov year11   dec
1: 205.3 2014 188.2  2014 167.8  1920 122.1  1882 134.2  1906 152.0  1948 173.6  1882 191.3  1985 196.5  1957 234.6   1998 224.2   1931 232.6
2: 203.2 2016 179.0  1950 162.9  1868 119.5  1993 133.4  1925 147.3  1980 156.4  1932 181.1  1900 193.6  1927 211.6   1862 222.5   1928 230.1
   year12   win year13   spr year14   sum year15   aut year16    ann year17
1:   1979 529.4   1960 348.9   1983 438.8   1928 527.8   1944 1547.8   2008
2:   1994 522.5   1994 340.4   1903 426.6   2007 519.6   1923 1530.2   2015

reproducible code:

structure(list(jan = c(268.1, 263.1, 235.2, 223.3, 219.2, 218.3
), year = c(1928, 1948, 2008, 1877, 1995, 1990), feb = c(287.6, 
241.9, 213.7, 205.1, 191.9, 191.2), year2 = c(2020, 2002, 1997, 
1990, 1958, 1923), mar = c(225.3, 190.7, 187.8, 187.2, 175.9, 
173.9), year3 = c(1981, 1903, 2019, 1947, 1994, 1912), apr = c(147.4, 
147.1, 143.1, 138.1, 132.4, 128.4), year4 = c(1920, 1867, 1970, 
1913, 1947, 2012), may = c(166.5, 161.1, 153.4, 142.1, 141.8, 
139.1), year5 = c(1967, 1924, 1886, 1920, 2015, 1993), jun = c(205.6, 
173.7, 172.5, 170.6, 161.1, 161.1), year6 = c(2012, 1928, 2007, 
1907, 1872, 1998), jul = c(196.1, 191.4, 183.6, 180.7, 180.3, 
178.9), year7 = c(1939, 1888, 1920, 1988, 2009, 1880), aug = c(260.9, 
231, 209.7, 205.4, 203.9, 197.7), year8 = c(1956, 1917, 1891, 
1927, 1879, 2004), sep = c(287.1, 238.1, 223.3, 208.8, 207.6, 
206.2), year9 = c(1918, 1950, 1869, 1935, 1866, 1872), oct = c(286.1, 
278.8, 263.8, 259.6, 249.9, 248.4), year10 = c(1967, 1903, 1954, 
2000, 1938, 1870), nov = c(306.7, 260.9, 253.6, 252.6, 242.3, 
235.8), year11 = c(2009, 2015, 1929, 2000, 1951, 1954), dec = c(343.1, 
266, 246.8, 246.5, 238.9, 238), year12 = c(2015, 1993, 1868, 
1986, 1929, 2006), win = c(688.7, 625.9, 582.1, 560, 558.4, 546.6
), year13 = c(2016, 1995, 2014, 1990, 2020, 1877), spr = c(457.3, 
408.9, 375.8, 372.8, 371.8, 371.8), year14 = c(1920, 1947, 1979, 
2006, 1981, 1913), sum = c(499.7, 489.5, 483.8, 468.6, 452.1, 
446.9), year15 = c(1879, 2012, 1956, 1912, 1927, 2020), aut = c(673, 
668.7, 580.8, 567.9, 560.8, 554.5), year16 = c(2000, 1954, 1872, 
1935, 1903, 1981), ann = c(1758.2, 1691.1, 1690, 1660.7, 1648.5, 
1624.6), year17 = c(1872, 1954, 2000, 1877, 1903, 2012)), row.names = c(NA, 
6L), class = "data.frame")

Expected output:

 jan   year feb    year2  mar    year3        ...
 205.3 2014 287.6  2020   187.8  2019         ...
 203.2 2016 188.2  2014    NA     NA          ...
2
  • @akrun I'm expecting the output to be similar like the one above, where for each year greater than 2011, selects the values from the months and seasons relative to those values. Like the first year and jan Commented Jan 1, 2021 at 19:48
  • Yeah they're supposed to be numeric, I forgot to change it when producing the code. I had the data extracted from html, so the filtering into numeric came later. I'll produce an example of the expected output Commented Jan 1, 2021 at 19:54

1 Answer 1

3

Perhaps, we can use indexing to replace the values where the corresponding 'year' value is less than or equal to 2011

i1 <- grep("^year", names(rank_data))
i2 <- i1 -1
tmp <- NA^(rank_data[i1] <= 2011)
rank_data[i2]  <-  rank_data[i2] * tmp 
rank_data[i1] <- rank_data[i1] * tmp

If we want to remove columns that are all NA along with the 'year'

i3 <- rep(sapply(rank_data[i2], function(x) any(!is.na(x))), each = 2)
lst1 <- lapply(rank_data[i3], function(x) x[complete.cases(x)])
mx <- max(lengths(lst1))
do.call(cbind, lapply(lst1, `length<-`, mx))

-output

#  feb year2   mar year3   apr year4   may year5   jun year6   nov year11   dec year12   win year13   sum year15    ann year17
#[1,] 287.6  2020 187.8  2019 128.4  2012 141.8  2015 205.6  2012 260.9   2015 343.1   2015 688.7   2016 489.5   2012 1624.6   2012
#[2,]    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA    NA     NA 582.1   2014 446.9   2020     NA     NA
#[3,]    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA    NA     NA 558.4   2020    NA     NA     NA     NA 
Sign up to request clarification or add additional context in comments.

3 Comments

Is there a way to only show the years that are greater than 2011 and have monthly values. So if there are no years greater than 2011, then that column is removed along with the month? For example, from the output above, jan and year are removed because all values are below 2011. whilst feb and year2 only the value 287.6 and year2; 2020 is kept. When columns are of differring rows, then they're replaced with NAs. I believe this is what I originally had in mind, but could only express it properly from your output.
That's also what I wanted, although I wish to have the years not with values removed. I have updated an expected output in the main question, apologies for taking so long, I had to try figure out how to do it, I managed doing it individually which would work If I combined the results into a data.frame, however it will take a long time. I managed using your updated code and using rank_data[5:6][rank_data[,6] %in% 2010:2020,] for the output, and replacing the numeric values for each individual year and month
@Lime If you want those other values in year to be NA as well, rank_data[i1] <- rank_data[i1] * tmp

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.