0

Lets say I am importing a flat file from HDFS into spark using something like the following:

val data = sc.textFile("hdfs://name_of_file.tsv").map(_.split('\t'))

This will produce an Array[Array[String]]. If I wanted an array of tuples I could do as referenced in this solution and map the elements to a tuple.

val dataToTuple = data.map{ case Array(x,y) => (x,y) }

But what if my input data has say, 100 columns? Is there a way in scala using some sort of wildcard to say

val dataToTuple = data.map{ case Array(x,y, ... ) => (x,y, ...) }

without having to write out 100 variable to match on?

I tried doing something like

val dataToTuple = data.map{ case Array(_) => (_) }

but that didn't seem to make much sense.

3
  • Why would you want a tuple with 100 elements? Just use the array that split produces? Commented May 12, 2016 at 19:43
  • You can create a Row instead of a Tuple Commented May 12, 2016 at 19:47
  • 1
    if you really need that - you can use Shapeless library: stackoverflow.com/a/19901310/1809978, but be aware that maximum size of tuple is limited in scala to 22 (last time I was checking it) + I believe, you still have to specify type per column. Besides, it might not be what you actually need Commented May 12, 2016 at 19:48

1 Answer 1

1

If your data-columns are homogenous (like Array of Strings) - tuple may not be a best solution to improve type-safety. All you can do is to fix the size of your array using sized list from Shapeless library:

How to require typesafe constant-size array in scala?

This is a right approach if your column's are unnamed. For instance, your row might be a representation of a vector in Euclidean space.

Otherwise (named columns, maybe different types), it's better to model it with a case class, but be aware of size restriction. This might help you to quickly map array (or its parts) to ADT: https://stackoverflow.com/a/19901310/1809978

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.