I have a difficult dataframe problem where I am trying to create new columns / column names / column values out of an existing dataframe that isn't formatted like I want it to be. The data has playerIDs and playerTypes for 4 different players at a time, and looks like this:
dput(my.player.data)
structure(list(p_id = c(8470828L, 8478460L, 8470966L, 8475314L,
8476472L, 8476917L, 8475791L, 8470105L, 8476905L, 8474152L, 8470642L,
8479325L, 8475218L, 8471296L, 8476874L, 8477943L, 8477934L, 8473432L
), pType = c("Blocker", "Shooter", "Blocker", "Shooter", "Blocker",
"Hitter", "Blocker", "Shooter", "PlayerID", "PlayerID", "Shooter",
"Hitter", "PlayerID", "Blocker", "Shooter", "Scorer", "Scorer",
"Scorer"), p_id1 = c(8475172L, 8470645L, 8474162L, NA, 8480172L,
8477989L, 8476879L, NA, NA, NA, NA, 8474683L, NA, 8476851L, 8469514L,
8477407L, 8478402L, 8474091L), pType1 = c("Shooter", "Goalie",
"Shooter", NA, "Shooter", "Hittee", "Shooter", NA, NA, NA, NA,
"Hittee", NA, "Shooter", "Goalie", "Assist", "Assist", "Assist"
), p_id2 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 8475246L, 8471729L, 8477018L), pType2 = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Assist",
"Assist", "Assist"), p_id3 = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 8475622L, 8471239L, 8469608L), pType3 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Goalie",
"Goalie", "Goalie")), .Names = c("p_id", "pType", "p_id1", "pType1",
"p_id2", "pType2", "p_id3", "pType3"), row.names = c(1L, 5001L,
10001L, 15001L, 20001L, 25001L, 30001L, 35001L, 40001L, 45001L,
50001L, 55001L, 60001L, 65001L, 70001L, 47329L, 46786L, 45551L
), class = "data.frame")
# ignore that the row numbers are 1, 5000, 10000, etc.
head(my.player.data)
p_id pType p_id1 pType1 p_id2 pType2 p_id3 pType3
1 8470828 Blocker 8475172 Shooter NA <NA> NA <NA>
5001 8478460 Shooter 8470645 Goalie NA <NA> NA <NA>
10001 8470966 Blocker 8474162 Shooter NA <NA> NA <NA>
15001 8475314 Shooter NA <NA> NA <NA> NA <NA>
20001 8476472 Blocker 8480172 Shooter NA <NA> NA <NA>
25001 8476917 Hitter 8477989 Hittee NA <NA> NA <NA>
There are only a fixed number of pTypes in my data across the 4 pType columns, (Blocker, Shooter, Goalie, etc.) and I would like to create a column for each one of these, with the value in the column equal to the respective playerID.
For example, I'd like something that looks like this:
head(better.player.data)
Blocker Shooter Hittee Hitter Assist1 Assist2 Scorer Goalie
1 8470828 8475172 NA NA NA NA NA NA
5001 NA 8478460 NA NA NA NA NA 8470645
10001 8470966 8474162 NA NA NA NA NA NA
15001 NA 8475314 NA NA NA NA NA NA
20001 8476472 8480172 NA NA NA NA NA NA
25001 NA NA 8477989 8476917 NA NA NA NA
The main edge-case here is that Assist1 and Assist2 are both labeled as Assist in the my.player.data dataframe (see the last 3 rows, not shown in the head()). I'd like for p_id1 to be Assist1 and p_id2 to be Assist2 (pType1 and pType2 should be the only 2 columns in the original data where the value is Assist (shouldnt be in pType or pType3)
Any help with this, as always, is greatly appreciated! Thanks!
PlayerID? I dont think, presence ofPlayerIDis very meaningful though.