0

I have data on mergers for 20 years for various firms. I have used a "for" loop in R to separate data for each year which gives me 20 data frames in the global environment. Each data frame is identified by its year: Merger2000 to Merger2019 for 20 years. Now I want to write another for loop to find the unique companies in each data frame (that is, unique firms in each year). Each company is identified by a unique company code (co_code). I know how to do this for each year separately. For example, for the year 2000, I would do something like:

uniquemerger2000 <- Merger2000 %>% distinct(co_code, .keep_all = TRUE)

How do I run a for loop to enable this operation for all years (that is from 2000-2019)? There is some indexing required in the code but I am not sure how to operationalise this in a loop.

Any help would be appreciated. Thanks!

1
  • Why not create a single dataframe with a year variable? If you have 20 variables with names which differ only by a number appended at the end, there is probably a single data structure waiting to be born. Commented Jan 3, 2021 at 13:41

1 Answer 1

1

Usually it is better to keep data in one dataframe or a list instead of multiple such objects in global environment.

You can create one list object (list_data) bringing all the dataframes together and use lapply/map to keep unique rows from each dataframe.

library(dplyr)
library(purrr)

list_data <- mget(paste0('Merger', 2000:2019))
result <- map(list_data, ~.x %>% distinct(co_code, .keep_all = TRUE))

Or in base R :

result <- lapply(list_data, function(x) x[!duplicated(x$co_code), ])
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Ronak. This was super helpful. Can I ask a related question? After I get the result which has the list of all dataframes (retaining unique rows from each dataframe), if I want to create a frequency table counting rows for dataframes of each year (in other words the unique companies in each year), do you know how can I do that?
You can use table in lapply. Something like result <- lapply(list_data, function(x) table(x$year))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.