1

I want to parse the file names of multiple doc files (MS office) using java. How should I go about doing this?

I was able to find an API on extracting info from the doc itself, but I can't find information on the file name itself.

So say I have a doc file XX_232312_22, I want to just parse the file name (ie 232312 part).

EDIT: What would we do if we need to parse more than just one file? For instance, all 1000 files in one directory?

8
  • Are you looking for new File("path/file.doc").getName()? Commented Jun 4, 2013 at 16:07
  • So you mean you want to angelsoft2311 out of angelsoft2311-1? Am I right? OR do you have anymore issues in getting file name. Commented Jun 4, 2013 at 16:07
  • file.getName().replaceFirst("\\.\\w+$", ""); Commented Jun 4, 2013 at 16:07
  • Actually file names are like this XX_232121_00 . So I want to extract 232121 from the file name. Commented Jun 4, 2013 at 16:11
  • Do you have the filenames or just the directory names? Btw, if you have more than one question to ask, you should post separate questions. Commented Jun 4, 2013 at 16:32

2 Answers 2

1
String[] parts = filename.split("-");
parts[0] // part before dash
parts[1] // part after dash

You can look up String.split in the java docs: http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split%28java.lang.String%29

EDIT:

OP changed the format of the filename to XX_filename__00.

It would then be

String[] parts = filename.split("_");
parts[0] // part before first _
parts[1] // part between two _
parts[2] // part after second _
Sign up to request clarification or add additional context in comments.

Comments

0

This should work for you.

fileName.split("-")[0]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.