I am reading data from a CSV file and then creating a DataFrame. But when I try to access the data in the DataFrame I get TypeError.
fields = [StructField(field_name, StringType(), True) for field_name in schema.split(',')]
schema = StructType(fields)
input_dataframe = sql_context.createDataFrame(input_data_1, schema)
print input_dataframe.filter(input_dataframe.diagnosis_code == '11').count()
Both 'unicode' and 'str' are not working with Spark DataFrame. I get the below TypeError:
TypeError: StructType can not accept object in type TypeError: StructType can not accept object in type
I tried encoding in 'utf-8' as below but still get the error but now complaining about TypeError with 'str':
input_data_2 = input_data_1.map(lambda x: x.encode("utf-8"))
input_dataframe = sql_context.createDataFrame(input_data_2, schema)
print input_dataframe.filter(input_dataframe.diagnosis_code == '410.11').count()
I also tried parsing the CSV directly as utf-8 or unicode using the param use_unicode=True/False