Maniputale CSV with scala spark

Question

I have a csv file which is "semi-structured"

canal,username,email,age
facebook,pepe22,[email protected],24
twitter,foo-24,[email protected],33
facebook,caty24,,22

suppose that i want the first column the second and the third column into an RDD org.apache.spark.rdd.RDD[(String, String, String)]

I am realy new, im using spark 1.4.1 ,my code reach here

val rdd = sc.textFile("/user/ergorenova/socialmedia/allus/test").map(_.split(","))

Can someone help me?

I would really appreciate it

Possible duplicate of Scala map function over RDD

Alberto Bonsanto
– Alberto Bonsanto

2016-02-24 13:38:58 +00:00
Commented Feb 24, 2016 at 13:38 — Alberto Bonsanto
– Alberto Bonsanto, Commented Feb 24, 2016 at 13:38

Radu Ionescu · Accepted Answer · 2016-02-24 14:38:10Z

1

val rdd = sc.textFile("/user/ergorenova/socialmedia/allus/test")
            .map( _.split(",",-1) match {

               case Array(canal, username, email) => (canal, username, email)

               case Array(canal, username, email, age) => (canal, username, email)
            })

You will obtain a tuple made out of the first,second and third column.

edited Feb 24, 2016 at 14:38

answered Feb 24, 2016 at 13:35

Radu Ionescu

3,5426 gold badges27 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

R. Gabriel Over a year ago

Thank you very much, but now, i have other issue, if i don´t have the last element, for example without age, the code fail, how i can solve this?

R. Gabriel Over a year ago

error in the second line "not found: value age" , another way? thanks for response

R. Gabriel Over a year ago

if edit this line "case Array(canal, username, email) => (canal, username, age)" with "case Array(canal, username, email) => (canal, username, "")" works, this solution is right??

Radu Ionescu Over a year ago

Sorry, the last part should be email and not age. I mixed them.

R. Gabriel Over a year ago

not problem, the issue is that now i want age, and i don´t have this data in one record on this file, and i have an error.

|

Collectives™ on Stack Overflow

Maniputale CSV with scala spark

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related