0

I'm using cURL to get a web page and present to our users. Things have worked well until I came upon a website using considerable amounts of Ajax that's formatted so:

33687|updatePanel|ctl00_SiteContentPlaceHolder_FormView1_upnlOTHER_NATL|
                                        <div id="ctl00_SiteContentPlaceHolder_FormView1_othernationalities">
                                            <h4>

                                                <span class="tooltip_text" onmousemove="widetip=false; tip=''; delayToolTip(event,tip,widetip,0,0);return false"
                                                    onmouseout="hideToolTip()">
                                                    <span id="ctl00_SiteContentPlaceHolder_FormView1_lblProvideOTHER_NATL">Provide the following information:</span></span>
                                            </h4>
|
266|scriptBlock|ScriptContentNoTags|
    document.getElementById('ctl00_SiteContentPlaceHolder_FormView1_dtlOTHER_NATL_ctl00_csvOTHER_NATL').dispose = function() {
        Array.remove(Page_Validators, document.getElementById('ctl00_SiteContentPlaceHolder_FormView1_dtlOTHER_NATL_ctl00_csvOTHER_NATL'));
    }

So, each part of the response is 4 parts: 2 and 3 are just identifiers, 4 is the real "body", and 1 is the length of the body. The problem comes in that we modify the body, and I need to be able to update the length of the 1st part to indicate that; otherwise, we throw a parsing error when inserting this into the web page.

I'm trying to figure out a combination of shell commands (awk, sed, whatever) to: a) read the saved file b) run regex on it to gather each individual block of information (using '(\d*?)\|(.?)\|(.?)\|(.*?)\|') c) make the first capturing group equal to the length of the last capturing group d) save all the regex matches to a new document or back to the original

Any input from "the collective" would be GREATLY appreciated.

1 Answer 1

1

It doesn't look like a single line of RegEx will solve this problem, as there is no way to put the first captured bracket between {braces} to indicate the length. This is what I'm thinking would be ideal:

(\d*?)\|([^|]+)\|([^|]+)\|(.{\1})\|

That value can also not be bypassed because there is no indication of an escape character in the case that there is a | somewhere in the message body. I suggest a straight split by '|' and using a 2-dimensional array to store the content. Check every forth item for a matching length and if too short, concatenate a | and the next item, then increment the read counter. PHP shall explain:

$items=explode('|', $file)
$len=count($items);
$oi=0;
$ol=-1;
for($i=0;$i<$count;++$i){
  $output[$oi][++$ol]=$items[$i];
  if($ol==3){
    $target=$output[$oi][0];
    while(strlen($output[$oi][3])<$target){
      $output[$oi][3].='|'.$items[++$i];
    }
    ++$oi;
    $ol=-1;
  }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.