I have 3 string values and 4 lists that Im trying to create a dataframe.
Lists -
>>> exp_type
[u'expect_column_to_exist', u'expect_column_values_to_not_be_null', u'expect_column_values_to_be_of_type', u'expect_column_to_exist', u'expect_table_row_count_to_equal', u'expect_column_sum_to_be_between']
>>> expectations
[{u'column': u'country'}, {u'column': u'country'}, {u'column': u'country', u'type_': u'StringType'}, {u'column': u'countray'}, {u'value': 10}, {u'column': u'active_stores', u'max_value': 1000, u'min_value': 100}]
>>> results
[{}, {u'partial_unexpected_index_list': None, u'unexpected_count': 0, u'unexpected_percent': 0.0, u'partial_unexpected_list': [], u'partial_unexpected_counts': [], u'element_count': 102}, {u'observed_value': u'StringType'}, {}, {u'observed_value': 102}, {u'observed_value': 22075.0}]
>>> this_exp_success
[True, True, True, False, False, False]
Strings -
>>> is_overall_success
'False'
>>> data_source
'stores'
>>>
>>> run_time
'2022-02-24T05:43:16.678220+00:00'
Trying to create a dataframe as below.
columns = ['data_source', 'run_time', 'exp_type', 'expectations', 'results', 'this_exp_success', 'is_overall_success']
dataframe = spark.createDataFrame(zip(data_source, run_time, exp_type, expectations, results, this_exp_success, is_overall_success), columns)
Error -
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/session.py", line 748, in createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/usr/lib/spark/python/pyspark/sql/session.py", line 416, in _createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
File "/usr/lib/spark/python/pyspark/sql/session.py", line 348, in _inferSchemaFromList
schema = reduce(_merge_type, (_infer_schema(row, names) for row in data))
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1101, in _merge_type
for f in a.fields]
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1114, in _merge_type
_merge_type(a.valueType, b.valueType, name='value of map %s' % name),
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1094, in _merge_type
raise TypeError(new_msg("Can not merge type %s and %s" % (type(a), type(b))))
TypeError: value of map field results: Can not merge type <class 'pyspark.sql.types.LongType'> and <class 'pyspark.sql.types.StringType'>
Expected Output
+---------------+--------------------------------+--------------------------------------+---------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+------------------+
| data_source | run_time |exp_type |expectations |results |this_exp_success|is_overall_success|
+---------------+--------------------------------+--------------------------------------+---------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+------------------+
|stores |2022-02-24T05:43:16.678220+00:00|expect_column_to_exist |{u'column': u'country'} |{} |True |False |
|stores |2022-02-24T05:43:16.678220+00:00|expect_column_values_to_not_be_null |{u'column': u'country'} |{u'partial_unexpected_index_list': None, u'unexpected_count': 0, u'unexpected_percent': 0.0, u'partial_unexpected_list': [], u'partial_unexpected_counts': [], u'element_count': 102} |True |False |
|stores |2022-02-24T05:43:16.678220+00:00|expect_column_values_to_be_of_type |{u'column': u'country', u'type_': u'StringType'} |{u'observed_value': u'StringType'} |True |False |
|stores |2022-02-24T05:43:16.678220+00:00|expect_column_to_exist |{u'column': u'countray'} |{} |False |False |
|stores |2022-02-24T05:43:16.678220+00:00|expect_table_row_count_to_equal |{u'value': 10} |{u'observed_value': 102} |False |False |
|stores |2022-02-24T05:43:16.678220+00:00|expect_column_sum_to_be_between |{u'column': u'active_stores', u'max_value': 1000, u'min_value': 100} |{u'observed_value': 22075.0} |False |False |
+---------------+--------------------------------+--------------------------------------+---------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+------------------+
columns. Now I want to create a dataframe by zipping the lists exp_type, expectations, results, this_exp_success and strings is_overall_success, run_time, data_source. Thats where Im getting this error.