0

I have a .bin file and want to partition it and get it as byte array. but using map() don't help me and when I get size of whole data, it isn't size of my file (it is bigger than file's size)

For test it, I was faced with other problem, when I use getNumPartitions() to get number of partition, 1 is printed in output but function that use in map(), is called more than one. Another problem is when sum the size of each partition, result isn't my file's size and bigger than it(I get size with sys.getsizeof() in map() function)

  1. How read .bin file as byte array? and,
  2. What's a way for partition file as fixed size? and,
  3. Can I make partition with overlapping and set location of spliting?
2
  • How did you load your .bin file? Commented Dec 28, 2015 at 7:50
  • @WoodChopper with textFile() Commented Dec 28, 2015 at 8:33

1 Answer 1

1

For fixed size, take a look at https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext.binaryRecords

Sign up to request clarification or add additional context in comments.

2 Comments

thanks alot, is it same as textFile()? can I replace it with textFile witout change in other code?what thing I must import? I use spark 1.4.4
@user3416282 a) I hope so, and b) no idea, I don't use python.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.