0

Let's say I have data on two firms that are each expanding to new cities. I have the dates of each additional store added in each city:

df <- data.frame(
    firm = c(rep(1,4), rep(2,4)),
    date = as.Date(c('2017-01-01', '2017-03-01', '2017-05-01',
                     '2017-06-01', '2017-02-01', '2017-04-01',
                     '2017-05-01', '2017-06-01')),
    city = c('New York', 'DC', 'New York', 'Atlanta', 'DC', 'DC',
             'Chicago', 'Atlanta'),
    numStores = c(1, 1, 2, 1, 1, 2, 1, 1))

I want to create a column of integers that tells me the order that each firm entered each city based on the date variable. The result should be this:

df$cityOrder = c(1,2,1,3,1,1,2,3)

4 Answers 4

4

Using dplyr...

library(dplyr)
df %>% left_join(df %>% filter(numStores==1) %>% 
                        group_by(firm) %>% 
                        mutate(cityOrder=order(date)) %>% 
                        select(-numStores,-date))

  firm       date     city numStores cityOrder
1    1 2017-01-01 New York         1         1
2    1 2017-03-01       DC         1         2
3    1 2017-05-01 New York         2         1
4    1 2017-06-01  Atlanta         1         3
5    2 2017-02-01       DC         1         1
6    2 2017-04-01       DC         2         1
7    2 2017-05-01  Chicago         1         2
8    2 2017-06-01  Atlanta         1         3
Sign up to request clarification or add additional context in comments.

1 Comment

Nice - this works great. Almost the same as @dash2's answer but with a very clever usage of left_join() to join a data frame that is created inside the left_join(). It's like a "lambda" data frame haha.
2

One way:

library(dplyr)
first_entry <- df %>% 
      group_by(firm, city) %>% 
      summarize(first = min(date)) %>% 
      group_by(firm) %>% 
      mutate(order = order(first)) %>% 
      select(-first)

left_join(df, first_entry)

  firm       date     city numStores order
1    1 2017-01-01 New York         1     1
2    1 2017-03-01       DC         1     2
3    1 2017-05-01 New York         2     1
4    1 2017-06-01  Atlanta         1     3
5    2 2017-02-01       DC         1     1
6    2 2017-04-01       DC         2     1
7    2 2017-05-01  Chicago         1     2
8    2 2017-06-01  Atlanta         1     3

Comments

1

A dplyr solution.

library(dplyr)

# create a summary by firm by city 
df_summary <- df %>%
  # sequence by earliest date by firm and city
  group_by(firm, city) %>%
  summarize(min_date = min(date)) %>%
  # number (using row_number)  *within* the firm group, arranged by date
  arrange(firm, min_date) %>%
  group_by(firm) %>%
  mutate(cityOrder = row_number()) %>%
  # drop the date column
  select(firm, city, cityOrder)

print(df_summary)
# A tibble: 6 x 3
# Groups:   firm [2]
# firm city     cityOrder
# <dbl> <fct>        <int>
# 1    1. New York         1
# 2    1. DC               2
# 3    1. Atlanta          3
# 4    2. DC               1
# 5    2. Chicago          2
# 6    2. Atlanta          3

## join the summary city order to the original
df_new <- df %>%
  left_join(df_summary, by = c(firm = "firm", city = "city")) 
print (df_new$cityOrder) 
# [1] 1 2 1 3 1 1 2 3

2 Comments

essentially the same as @dash2s better (more compact) answer.
I appreciate the detailed commenting in this solution. This is especially helpful for those who aren't as familiar with dplyr and piping.
1

You can us factors with tapply in base R:

within(df,cityOrder <- unlist(tapply(city, firm, function(x) 
  as.numeric(factor(x,unique(x))))))

#   firm       date     city numStores cityOrder
# 1    1 2017-01-01 New York         1         1
# 2    1 2017-03-01       DC         1         2
# 3    1 2017-05-01 New York         2         1
# 4    1 2017-06-01  Atlanta         1         3
# 5    2 2017-02-01       DC         1         1
# 6    2 2017-04-01       DC         2         1
# 7    2 2017-05-01  Chicago         1         2
# 8    2 2017-06-01  Atlanta         1         3

(IF your dates are sorted for each firm as they are in this example)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.