1

this is my first post to stackoverflow, but I've used this amazing site before.

Anyway, I suck at regular expressions but I think I need them for what I need to do.

Short Question: I need to replace the space ' ' with '&nbsp;' between any occurrence of <code></code>.

More details:

The motivation behind this was because my code sections were creating extra lines every other line because of the extra spaces (I'm asuming). By replacing the spaces with &nbsp;, I was able to format the code correctly.

However, this introduced a LOT of extra characters into my HTML. Not only is it inefficient, it also makes word-wrap: break-word; break the words in half rather than move the entire word down.

1
  • You can use code in your question by using the syntax. You can also escape specific characters so they appear in your question. The escape character is `\`. Commented Mar 9, 2012 at 0:55

3 Answers 3

1

Do it with CSS instead:

code {white-space: nowrap;} /* or */ code {white-space: pre;}

See white-space CSS property­Docs.

Sign up to request clarification or add additional context in comments.

2 Comments

I tried all of the different white-space css and none of them worked right. I'm not sure what's going on.
You probably need to convert code into block-mode: code {display: block;} or use it's parent element. See code HTML Element.
1

First of all, not regex, but DOM. In PHP that would be:

foreach(DOMDocument::loadHTML($source)->getElementsByTagName('code') as $code) {
    foreach($code->childNodes as $node) {
       // assumes no elements, otherwise check nodeType == 3 
       // and recurse into elements
       $node->textContent = str_replace(" ","\xC2\xA0", $node->textContent);
    }
}

You can (and in DOM need) to use actual space character rather than entity that represents it.

However, those extra lines could be better controlled with:

code {white-space: nowrap;}

or white-space: pre/white-space: pre-line.

CSS solution has advantage of copy&pasteability. Otherwise &nbsp-filled examples will give "WTF!?" parse errors.

Also make sure your CMS/markup converter (if you're using one) doesn't insert <br> automatically that would double lines in <pre>/white-space:pre elements.

5 Comments

Well, the part how you insert the &nbsp;'s with DOMDocument is actually missing ;)
DOMDocument uses UTF-8 encoding. "\xa0" is an invalid character for UTF-8 -> utf8 "\xA0" does not map to Unicode. Take care.
I don't know where to put this DOM code. I'm using the CodeIgniter Framework. I'm not sure it'll work easily with how I'm loading my views and stuff. Also, I tried the CSS changes and those didn't work right either. :-/ I really think I'm going to need a way to replace spaces with nbsp when it gets stored in the database but only between the code tags. This is the way the current system works and it renders perfectly if structured like that (I'm building a new system but transferring existing data).
@hakre ah, indeed. I've utf-8-ized it. I'd usually just paste literal character.
@Max Magee: this code assumes you've got HTML in $source variable. $doc = new DOMDocument(); $doc->loadHTML($source); and $source = $doc->saveHTML(); gives you 2-way conversion.
0

disclaimer: in no way do i think this is the solution you will neccessarily arrive at, some other answers already here address what you *should/could otherwise do to accomplish your task.

but let's just assume you DID want to do it with regex. Since I think we can make the assumption that with <code>stuff</code>, stuff won't contain nested code tags, you can accomplish your short question with it, but you still need a couple steps:

//sorry for the c#, the but intent should translate clearly.
string input = @"<div>whatever</div> id='tricky'><code>adsfasd   fasdfasdfvar data = "" 8 5.00000000 8.0 9.000000"";var re = /(\.0{0,2})(0*)/g; var match = re.exec(data);alert(data.replace(re, RegExp.1));</code><p>more stuff with stuff.</p>";
var code = Regex.Match(input, "<code>(.*?)</code>").Value;
var munged = Regex.Replace(code, @"\s", "&nbsp;");
var result = Regex.Replace(input, "<code>(.*?)</code>", munged); 

4 Comments

in php, preg_replace_callback would allow you to combine the first and last call to Regex
I tried my best to get regular expressions to work but I have no idea what I'm doing (insert science dog meme here). I need help with the PHP version but instead of the HTML code tags, I need to find the BBCode code tags [code] and [/code] and get the text between those tags so I can replace the spaces.
I ended up getting it with help from someone else [link]stackoverflow.com/questions/9640670/…
thanks for noticing it's the exact same approach though :). again, sorry bout the non-php.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.