3
| match_id | player_id | team | win |
|    0     |      1    |   A  |  A  |
|    0     |      2    |   A  |  A  |
|    0     |      3    |   B  |  A  |
|    0     |      4    |   B  |  A  |
|    1     |      1    |   A  |  B  |
|    1     |      4    |   A  |  B  |
|    1     |      8    |   B  |  B  |
|    1     |      9    |   B  |  B  |
|    2     |      8    |   A  |  A  |
|    2     |      4    |   A  |  A  |
|    2     |      3    |   B  |  A  |
|    2     |      2    |   B  |  A  |

I have a dataframe that looks like above.

I need to to create a map (key,value) pair such that for every

(k=>(player_id_1, player_id_2), v=> 1 ), if player_id_1 wins against player_id_2 in a match

and

(k=>(player_id_1, player_id_2), v=> 0 ), if player_id_1 loses against player_id_2 in a match

I will have to thus iterate through the entire data frame comparing each player id to another based upon the other 3 columns.

I am planning to achieve this as follows.

  1. Group by match_id

  2. In each group for a player_id check against other player_id's the following

    a. If match_id is same and team is different Then

         if team =  win
           (k=>(player_id_1, player_id_2), v=> 0 )
         else team != win
           (k=>(player_id_1, player_id_2), v=> 1 )
    

For example, after partitioning by matches consider match 1. player_id 1 needs to be compared to player_id 2,3 and 4. While iterating, record for player_id 2 will be skipped as the team is same for player_id 3 as team is different the team & win will be compared. As player_id 1 was in team A and player_id 3 was in team B and team A won the key-value formed would be

((1,3),1)

I have a fair idea of how to achieve this in imperative programming but I am really new to scala and functional programming and can't get a clue as to how while iterating through every row for a field create a (key,value) pair by having checks on other fields.

I tried my best to explain the problem. Please do let me know if any part of my question is unclear. I would be happy to explain the same. Thank you.

P.S: I am using Spark 1.6

4
  • The case that you took to generate ((1,3),1) would also generate ((2,4),1) or do you just want to skip the second player ids altogether? Commented Jul 31, 2017 at 6:50
  • this is just an inner or self join. Commented Jul 31, 2017 at 8:57
  • @philantrovert Yes I would need the ((2,4),1) too. This was to be done for all player_ids. Commented Jul 31, 2017 at 17:32
  • @RamGhadiyaram Thank you . I used rogue-one 's solution using inner joins and conditions mentioned and had it working. Commented Jul 31, 2017 at 17:36

1 Answer 1

4

This can be achieved using the DataFrame API as shown below..

Dataframe API version:

val df = Seq((0,1,"A","A"),(0,2,"A","A"),(0,3,"B","A"),(0,4,"B","A"),(1,1,"A","B"),(1,4,"A","B"),(1,8,"B","B"),(1,9,"B","B"),(2,8,"A","A"),(2,4,"A","A"),(2,3,"B","A"),(2,2,"B","A")
).toDF("match_id", "player_id", "team", "win")

val result = df.alias("left")
       .join(df.alias("right"), $"left.match_id" === $"right.match_id" && not($"right.team" === $"left.team"))
       .select($"left.player_id", $"right.player_id", when($"left.team" === $"left.win", 1).otherwise(0).alias("flag"))

scala> result.collect().map(x => (x.getInt(0),x.getInt(1)) -> x.getInt(2)).toMap
res4: scala.collection.immutable.Map[(Int, Int),Int] = Map((1,8) -> 0, (3,4) -> 0, (3,1) -> 0, (9,1) -> 1, (4,1) -> 0, (8,1) -> 1, (2,8) -> 0, (8,3) -> 1, (1,9) -> 0, (1,4) -> 1, (8,2) -> 1, (4,9) -> 0, (3,2) -> 0, (1,3) -> 1, (4,8) -> 0, (4,2) -> 1, (2,4) -> 1, (8,4) -> 1, (2,3) -> 1, (4,3) -> 1, (9,4) -> 1, (3,8) -> 0)

SPARK SQL version:

df.registerTempTable("data_table")

val result = sqlContext.sql("""
SELECT DISTINCT t0.player_id, t1.player_id, CASE WHEN t0.team == t0.win THEN 1 ELSE 0 END AS flag FROM data_table t0
INNER JOIN data_table t1
ON t0.match_id = t1.match_id
AND t0.team != t1.team
""")
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you @rogue-one for both the solutions. Maybe I was thinking in a way too complicated manner and you made the solutions look so cleaner and easier. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.