I've a pyspark dataframe that contains 3 columns:
- databricksPath
- countryPartition
- yearPartition
I'm creating this dataframe based on values coming from widgets via Data Factory: https://i.sstatic.net/8zIuO.png
pyspark dataframe: https://i.sstatic.net/ZcjZO.png
With this dataframe I wanted to create an output with all the combinations in which I have, for example, a json structure to send as output to the ADF for example with this command (dbutils.notebook.exit({'message': 'Success', 'databricksPath': databricksPath,'yearPartition': yearPartition,'countryPartition': countryPartition})) and be able to use it in foreach activity
Output example:
"output": {
"value": [
{
"country": "PT",
"year": "2022",
"databricksPath": "/notebooks/1.Project/Notebook_1"
},
{
"country": "ES",
"year": "2022",
"databricksPath": "/notebooks/1.Project/Notebook_1"
},
{
"country": "IT",
"year": "2022",
"databricksPath": "/notebooks/1.Project/Notebook_1"
},
{
"country": "BE",
"year": "2022",
"databricksPath": "/notebooks/1.Project/Notebook_1"
},
{
"country": "PT",
"year": "2022",
"databricksPath": "/notebooks/1.Project/Notebook_2"
},
{
"country": "ES",
"year": "2022",
"databricksPath": "/notebooks/1.Project/Notebook_2"
},
{
"country": "IT",
"year": "2022",
"databricksPath": "/notebooks/1.Project/Notebook_2"
},
{
"country": "BE",
"year": "2022",
"databricksPath": "/notebooks/1.Project/Notebook_2"
}
]
}
Notebook that I'm using:
# Databricks notebook source
from pyspark.sql import functions as F
from pyspark.sql.types import IntegerType
from pyspark.sql.types import StringType
from datetime import datetime, timedelta
from pyspark.sql.functions import col, lit, row_number, instr, expr, when, current_date, months_between, coalesce, concat_ws, sum as Sum, first, round, monotonically_increasing_id, date_format, concat, substring, count
from pyspark.sql.window import Window
from pathlib import Path
from functools import reduce
from pyspark.sql import DataFrame
import traceback
import pyodbc
import uuid
import sys
# COMMAND ----------
dbutils.widgets.text("databricksPath", "['/notebooks/1.Project/Notebook_1','/notebooks/1.Project/Notebook_2']", "databricksPath")
dbutils.widgets.text("countryPartition", "['PT','ES','IT','BE']", "countryPartition")
dbutils.widgets.text("yearPartition", "['2022']", "yearPartition")
databricksPath = dbutils.widgets.get('databricksPath')
countryPartition = dbutils.widgets.get('countryPartition')
yearPartition = dbutils.widgets.get('yearPartition')
# COMMAND ----------
from pyspark.sql.types import StructType,StructField, StringType
schema = StructType([
StructField('databricksPath', StringType(), True),
StructField('countryPartition', StringType(), True),
StructField('yearPartition', StringType(), True)
])
data2 = [(databricksPath,countryPartition,yearPartition)]
df = spark.createDataFrame(data=data2,schema=schema)
df2 = df.withColumn("databricksPath", concat_ws(",",col("databricksPath")))
display(df2)
# COMMAND ----------
dbutils.notebook.exit({'message': 'Success', 'databricksPath': databricksPath,'yearPartition': yearPartition,'countryPartition': countryPartition})
Can anyone please help me in achieving this
Thank you!


Output examplein databricks notebook and pass it to ADF to be able to use the returned value in for each?