I dont really understand what you want to do, but if you want to generate the combinations using the Spark dataframe api, you can do it like this
val patients = Seq(
(1, "f"),
(2, "m")
).toDF("id", "name")
val drugs = Seq(
(1, "drugY"),
(2, "drugC"),
(3, "drugX")
).toDF("id", "name")
patients.createOrReplaceTempView("patients")
drugs.createOrReplaceTempView("drugs")
sqlContext.sql("select p.id as patient_id, p.name as patient_name, d.id as drug_id, d.name as drug_name from patients p cross join drugs d").show
+----------+------------+-------+---------+
|patient_id|patient_name|drug_id|drug_name|
+----------+------------+-------+---------+
| 1| f| 1| drugY|
| 1| f| 2| drugC|
| 1| f| 3| drugX|
| 2| m| 1| drugY|
| 2| m| 2| drugC|
| 2| m| 3| drugX|
+----------+------------+-------+---------+
or with the dataframe api
val cartesian = patients.join(drugs)
cartesian.show
(2) Spark Jobs
+---+----+---+-----+
| id|name| id| name|
+---+----+---+-----+
| 1| f| 1|drugY|
| 1| f| 2|drugC|
| 1| f| 3|drugX|
| 2| m| 1|drugY|
| 2| m| 2|drugC|
| 2| m| 3|drugX|
+---+----+---+-----+
After that you can use a crosstab to get the a table of the frequency distribution
c.stat.crosstab("patient_name","drug_name").show
+----------------------+-----+-----+-----+
|patient_name_drug_name|drugC|drugX|drugY|
+----------------------+-----+-----+-----+
| m| 1| 1| 1|
| f| 1| 1| 1|
+----------------------+-----+-----+-----+
DataFrame, you do not need to worry about efficiency offorloops. Spark Tip 1 - Almost all operations on any DataFrame are very very expensive (relative to efficiency of for-loop).