0

I need to replicate the result from this array definition using an external file.

scala> val data = Seq(Array(Array(1, 2), Array(3)),Array(Array(1), Array(3, 2), Array(1, 2)),Array(Array(1, 2), Array(5)),Array(Array(6)))

data: Seq[Array[Array[Int]]] = List(Array(Array(1, 2), Array(3)), Array(Array(1), Array(3, 2), Array(1, 2)), Array(Array(1, 2), Array(5)), Array(Array(6)))

I tried creating a testdataI.txt file but can't make it to work.

testdataI.txt ->

1,2
3
1
3,2
1,2
1,2
5
6

Here the result when I do the conversion using io.Source:

import scala.io.Source

scala> val data = Seq(Source.fromFile("/tmp/testdataI.txt").getLines().map(_.split(",").map(_.trim.toInt)).toArray)

data: Seq[Array[Array[Int]]] = List(Array(Array(1, 2), Array(3), Array(1), Array(3, 2), Array(1, 2), Array(1, 2), Array(5), Array(6)))

The outcome should look like this (A series of Multidimensional Arrays)

data: Seq[Array[Array[Int]]] = List(Array(Array(1, 2), Array(3)), Array(Array(1), Array(3, 2), Array(1, 2)), Array(Array(1, 2), Array(5)), Array(Array(6)))

I found a lot of Multidimensional array information but nothing for this specific case.

Really appreciate,

Fredy A Gomez

7
  • First of all it looks like your input is a mixture of 2D and 1D arrays - I think that's where the confusion is starting. Commented May 12, 2016 at 21:25
  • 2
    I've looked at your desired output, and I can't work out the rules for what elements go in what array. Can you explain? Commented May 12, 2016 at 21:45
  • BTW, This is the version without using an external file. This gives me the result I want, but I need to do the same reading an external file : val data = Seq(Array(Array(1, 2), Array(3)),Array(Array(1), Array(3, 2), Array(1, 2)),Array(Array(1, 2), Array(5)),Array(Array(6))) Commented May 12, 2016 at 22:02
  • "You can setup your own rules or combinations as you want, " What are the rules, in English? I still can't see how to take your example input data and know when to start a new array.. Commented May 12, 2016 at 22:08
  • Maybe as Lee mentioned, the Seq creates a List (1D) of multidimensional arrays (2D) -> List[Array(Array[Int]]) How do my input file and/or Scala code should look like to accomplish this? Commented May 12, 2016 at 22:12

1 Answer 1

1

No idea why you want to structure the values like that, but here's how you can do it:

scala> import scala.io.Source
import scala.io.Source

scala> val take = List(2, 3, 2, 1)
take: List[Int] = List(2, 3, 2, 1)

scala> val data = Source.fromFile("/tmp/testdataI.txt").getLines().map(_.split(",").map(_.trim.toInt).toList).toList
data: List[List[Int]] = List(List(1, 2), List(3), List(1), List(3, 2), List(1, 2), List(1, 2), List(5), List(6))

scala> def awesomeGrouped(ungrouped: List[List[Int]], take: List[Int]): List[List[List[Int]]] = take match {
     |         case Nil => Nil
     |         case t :: ts => ungrouped.take(t) :: awesomeGrouped(ungrouped.drop(t), ts)
     |     }
awesomeGrouped: (ungrouped: List[List[Int]], take: List[Int])List[List[List[Int]]]

scala> def fixTypes(grouped: List[List[List[Int]]]) = grouped.map(_.map(_.toArray).toArray)
fixTypes: (sorted: List[List[List[Int]]])List[Array[Array[Int]]]

scala> fixTypes(awesomeGrouped(data, take))
res0: List[Array[Array[Int]]] = List(Array(Array(1, 2), Array(3)), Array(Array(1), Array(3, 2), Array(1, 2)), Array(Array(1, 2), Array(5)), Array(Array(6)))

The part that makes everyone uneasy is the take list distribution you've chosen; it seems arbitrary.

Note I added the fixTypes function specifically to return the exact return types you want. But arrays are not very idiomatic Scala; are you sure you need them? If not, just remove the fixTypes function and invocation.

Sign up to request clarification or add additional context in comments.

8 Comments

The "sort" in awesomeSort seems mis-named as there isn't an ordering involved. awesomeGrouped maybe. And while this solution works, the need to specify a different take list depending on what exactly is in the file does indeed make me uneasy.
Thanks mlg. This structure is required as an input for a sequential analysis algorithm called basket analysis. It requires date of purchase, items purchased and customer ID, so all these variables are organized in a structure like this: customer X (Date 1(item A, item B), Date 2(item C)), customer Y (Date 2(item C, item A)), customer Z .... And I need an input file because I have 15K customers.... You solution requires me to generate a new input file (take list) which provides the rules to create a new Array(New Customer). Thanks for your answer.
If I understand you correctly, you can't tell the difference from the file alone between a purchase of a single item on one date (like Date2(itemC) above) and another customer ID? Nasty format to have to deal with!
@TheArchetypalPaul good point about the naming; updated. FredyGomez: if you consider your question answered can you please consider accepting my answer?
@TheArchetypalPaul: agree with your analysis. I was kinda hoping that the cardinality distribution was constant 2-3-2-1 (something better modeled by a case class and a custom deserialisation mechanism), but after learning his use case it looks like my algorithm is gonna work for just his one-off example.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.