3

I have data in Row tuple format -

Row(Sentence=u'When, for the first time I realized the meaning of death.')

I want to convert it into String format like this -

(u'When, for the first time I realized the meaning of death.')

I tried like this (Suppose 'a' is having data in Row tupple)-

b = sc.parallelize(a)
b = b.map(lambda line: tuple([str(x) for x in line]))
print(b.take(4))

But I am getting result something like this -

[('W', 'h', 'e', 'n', ',', ' ', 'f', 'o', 'r', ' ', 't', 'h', 'e', ' ', 'f', 'i', 'r', 's', 't', ' ', 't', 'i', 'm', 'e', ' ', 'I', ' ', 'r', 'e', 'a', 'l', 'i', 'z', 'e', 'd', ' ', 't', 'h', 'e', ' ', 'm', 'e', 'a', 'n', 'i', 'n', 'g', ' ', 'o', 'f', ' ', 'd', 'e', 'a', 't', 'h', '.')]

Do anybody know what I am doing wrong here?

3 Answers 3

7

below is the code:

col = 'your_column_name'
val = df.select(col).collect()
val2 = [ ele.__getattr__(col) for ele in val]
Sign up to request clarification or add additional context in comments.

1 Comment

This worked for me with the following adjustment (cleaner): val2 = [ ele[col] for ele in val]
6

With single Row (why would you even...) it should be:

a = Row(Sentence=u'When, for the first time I realized the meaning of death.')

b = sc.parallelize([a])

and flattened with

b.map(lambda x: x.Sentence)

or

b.flatMap(lambda x: x)

although sc.parallelize(a) is already in the format you need - because you pass Iterable, Spark will iterate over all fields in Row to create RDD

Comments

0

Below worked for me 1: list_val = df.selectExpr("max(Location) as loc").collect() str_val = [e['loc'] for e in list_val][0]

2: row = df.first() string_value = row['columnName']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.