2

So I have a dataframe with one column like this:

+----------+
|some_colum|
+----------+
|        10|
|        00|
|        00|
|        10|
|        10|
|        00|
|        10|
|        00|
|        00|
|        10|
+----------+

where the column some_colum are binary strings.

I want to convert this column to decimal.

I've tried doing

data = data.withColumn("some_colum", int(col("some_colum"), 2))

But this doesn't seem to work. as I get the error:

int() can't convert non-string with explicit base

I think cast() might be able to do the job but I'm unable to figure it out. Any ideas?

2 Answers 2

2

I think the int cannot be applied directly to a column. You can use in a udf:

from org.apache.spark.sql import functions
binary_to_int = functions.udf(lambda x: int(x, 2), IntegerType())
data = data.withColumn("some_colum", binary_to_int("some_colum").alias('some_column_int'))
Sign up to request clarification or add additional context in comments.

Comments

0
def to_decimal(input_column, base_value):
    return int(input_column, base_value)  

to_decimal_udf = udf(to_decimal, IntegerType())
df = df.withColumn("decimal_value_binary",\
to_decimal_udf(col("binary_sensor_data"),\
lit(2)))

df.show()

1 Comment

Thank you for contributing to the Stack Overflow community. This may be a correct answer, but it’d be really useful to provide additional explanation of your code so developers can understand your reasoning. This is especially useful for new developers who aren’t as familiar with the syntax or struggling to understand the concepts. Would you kindly edit your answer to include additional details for the benefit of the community?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.