2

I'm trying to generate a PDF document from an uploaded ".docx" file using JODConverter. The call to the method that generates the PDF is something like this :

File inputFile = new File("document.doc");
File outputFile = new File("document.pdf");

// connect to an OpenOffice.org instance running on port 8100
OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
connection.connect();

// convert
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile);

// close the connection
connection.disconnect();

I'm using apache commons FileUpload to handle uploading the docx file, from which I can get an InputStream object. I'm aware that Java.io.File is just an abstract reference to a file in the system.

I want to avoid the disk write (saving the InputStream to disk) and the disk read (reading the saved file in JODConverter).

Is there any way I can get a File object refering to an input stream? just any other way to avoid disk IO will also do!

EDIT: I don't care if this will end up using a lot of system memory. The application is going to be hosted on a LAN with very little to zero number of parallel users.

1
  • can you please tell how did you write DocumentFormat(s) for XHTML(.html) and MS word 2003(.doc) ? Commented Dec 12, 2013 at 17:09

3 Answers 3

2

File-based conversions are faster than stream-based ones (provided by StreamOpenOfficeDocumentConverter) but they require the OpenOffice.org service to be running locally and have the correct permissions to the files.

Try the doc to avoid disk writting:

convert(java.io.InputStream inputStream, DocumentFormat inputFormat, java.io.OutputStream outputStream, DocumentFormat outputFormat) 
Sign up to request clarification or add additional context in comments.

1 Comment

WOW! you sir, made my day! Thankyou! wonder why thy didn't detail this in their website.
2

There is no way to do it and make the code solid. For one, the .convert() method only takes two Files as arguments.

So, this would mean you'd have to extend File, which is possible in theory, but very fragile, as you are required to delve into the library code, which can change at any time and make your extended class non functional.

(well, there is a way to avoid disk writes if you use a RAM-backed filesystem and read/write from that filesystem, of course)

2 Comments

thanks for the reply, but i really need to do this :(! If there is no other way, then I'm thinking RAM Drive.
@viswa: well, RAM Drive it is, then. Believe me, you really don't want to do this. Your code will be fragile at the very best.
1

Chances are that commons fileupload has written the upload to the filesystem anyhow.

Check if your FileItem is an instance of DiskFileItem. If this is the case the write implementation of DiskFileItem willl try to move the file to the file object you pass. You are not causing any extra disk io then since the write already happened.

4 Comments

no, commons fileupload gives u the option to get a stream. Check: link
I was referring to the write() call. In the DiskFileItem implementation it does a rename of the temporary file to another file (commons.apache.org/fileupload/apidocs/src-html/org/apache/…). My point is that chances are high commons fileupload already did disk IO to write the incoming stream to a temporary file.
oh. who would've thought :|. guess you're right. I'll look through it a little more and get back. thanks for your input :)
firstly +1 for the idea. and check this it's actually a little more complicated than what I assumed. theres a dynamic threshold value beyond which the file will be written on the disk. But in my case its going to be small documents so it won't really matter.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.