0

I have a sequence file generated by Spark using saveAsObjectFile function. File content is just some int numbers. And I want to read it locally with Java. Here is my code:

    FileSystem fileSystem = null;
    SequenceFile.Reader in = null;
    try {
        fileSystem = FileSystem.get(conf);
        Path path = new Path("D:\\spark_sequence_file");
        in = new SequenceFile.Reader(conf, SequenceFile.Reader.file(path));
        Writable key = (Writable)
                ReflectionUtils.newInstance(in.getKeyClass(), conf);
        BytesWritable value = new BytesWritable();
        while (in.next(key, value)) {
            byte[] val_byte = value.getBytes();
            int val = ByteBuffer.wrap(val_byte, 0, 4).getInt();
        }
    } catch (IOException e) {
        e.printStackTrace();
    }

But I can't read it correctly; I just get all the same values, and obviously they are wrong. Here is my answer snapshot

enter image description here

The file head is like this: enter image description here

Can anybody help me?

5
  • What do you mean by "I can't read it correctly"? Is the shown header wrong, or right, or why is it shown? Commented Apr 4, 2018 at 7:25
  • If they are ints, maybe using an IntWritable could help, instead of ByteWritable? Commented Apr 4, 2018 at 7:27
  • @kutschkem, I mean I just got the same numbers, and they are wrong. I have updated my question. And IntWritable is wrong, the file is generated by spark, and spark use byteswritable. Commented Apr 4, 2018 at 7:39
  • Ok the question makes a lot more sense now. Commented Apr 4, 2018 at 8:15
  • Maybe I shouldn't use saveAsObjectFile function, I am still testing. If I get the answer, I will tell you.@kutschkem Commented Apr 4, 2018 at 10:19

1 Answer 1

1

In Hadoop usually the Keys are of type WritableComparable and values are of type Writable. Keeping this basic concept in mind I read the Sequence File in the below way.

Configuration config = new Configuration();
Path path = new Path(PATH_TO_YOUR_FILE);
SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config);
WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();
Writable value = (Writable) reader.getValueClass().newInstance();
while (reader.next(key, value))
  // do some thing
reader.close();

The data issue in your case might be because of the reason you are using saveAsObjectFile() rather than using saveAsSequenceFile(String path,scala.Option<Class<? extends org.apache.hadoop.io.compress.CompressionCodec>> codec)

Please try to use the above method and see if the issue persist.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.