2

I need to display a word doc on a webpage. I am using a library named Docx4j to convert .doc to html. This is working fine. But, I'm getting the hyperlinks in the below format.

To search on google go to this link [#?] HYPERLINK \"http://www.google.com/\" [#?][#?] google[#?] and type the text.

I'm able to convert it to

To search on google go to this link  (http://www.google.com) google and type the text.

using the below code

String myText = "To search on google go to this link [#?] HYPERLINK \"http://www.google.com/\" [#?][#?] google[#?] and type the text.";
System.out.println(myText);
String firstReplace = myText.replaceAll("\\[", "").replaceAll("\\]", "").replaceAll("#\\?", "");
System.out.println(firstReplace);
String secondReplace = firstReplace.replaceAll("HYPER\\S+\\s+\"", "(");
System.out.println(secondReplace);
String finalReplace = secondReplace.replaceAll("/*\".", ")");
System.out.println("\n" + finalReplace);

Can someone please provide me a regex to convert the above string to

To search on google go to this link google (http://www.google.com) and type the text.

--EDIT--

There are some links which show up as

[#?] HYPERLINK \"http://www.google.com/\" [#?][#?] google page[#?]

I should change them to

google page (http://www.google.com)

How do I do this?

2 Answers 2

2

You can use a group reference to match the word google which comes after the parenthesis.

You can replace the result of following regex:

'(\([^)]*\))\s?(\w+)'

With following :

'$2 $1'

You can use str.replaceAll() function for this aim.

Elaboration:

The first capture group (\([^)]*\)) will match the part between parenthesis, [^)]* is a negated character class which match any combination of characters except closing parenthesis.

And the second one (\w+) will match the words after that part, \w+ will match any combination of word characters.

Sign up to request clarification or add additional context in comments.

3 Comments

can you please elaborate?
is there any way I can get "google.com" and replace it with "(google.com)" directly? I can't use this script given in the question since what I have is an HTML and replacing the " messes up my HTML
Thanks @Kasramvd. I've edited the question, please have a look.
0

Removing the [#?] markers as early as you do in your question, means that you lose essential information to make the required text adjustments later. The basic template of your input is:

[#?] HYPERLINK *target* [#?] [#?] *clickable textual description of link* [#?]

So why don't you use those markers to your advantage?

Some regexp like this (NOTE: not tested, probably wrong, but just to give you the basic idea):

mystring.replaceAll("\\[#\\?\\] HYPERLINK (.*) \\[#\\?\\] \\[#\\?\\] (.*) \\[#\\?\\]", "$2 ($1)");

The above is designed to give you "google page (http://www.google.com)". But I would also question why you want to display it like that. Normally for HTML web pages you want it to be <a href="http://www.google.com">google page</a>. To do that, just change the above code.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.