0

I have a convertor which will convert doc and docx to html for that convertor the class file is as follows

<?php
    class docxhtml {
        public $connectname;
        public $connectpass;

        public function __construct($format_res, $flname) {
            require_once('config.php');
            // Turn up error reporting
            error_reporting (E_ALL|E_STRICT);

            // Turn off WSDL caching
            ini_set ('soap.wsdl_cache_enabled', 0);

            // Define credentials for LD
            define ('USERNAME', $this->connectname);
            define ('PASSWORD', $this->connectpass);

            // SOAP WSDL endpoint
            define ('ENDPOINT', 'https://api.livedocx.com/2.1/mailmerge.asmx?wsdl');

            // Define timezone

            date_default_timezone_set('Europe/Berlin');

            // Instantiate SOAP object and log into LiveDocx

            $this->soap = new SoapClient(ENDPOINT);

            $this->soap->LogIn(
                array(
                    'username' => USERNAME,
                    'password' => PASSWORD
                )
            );

            // Upload template

            $this->data = file_get_contents('Original/'.$format_res);

            $this->soap->SetLocalTemplate(
                array(
                    'template' => base64_encode($this->data),
                    'format'   => 'docx'
                )
            );

            $this->result = $this->soap->RetrieveDocument(
                array(
                    'format' => 'html'
                )
            );

            $this->data = $this->result->RetrieveDocumentResult;



            file_put_contents('Recode/'.$flname.'.html', base64_decode($this->data));

        }
    }
?>

As you can see this class file send the converted file to Recode folder which will then get downloaded by save dialog box by a script in front end PHP..

Now what i need guidance was .. I want to convert that Resulting HTML to clean stripped html for which i did a code as follows which works well

<?php
$path = 'path to previous html output file from local machine';
$html = file_get_contents($path);
$dom = new DOMDocument();
//$dom->strictErrorChecking = false;
$dom->formatOutput = true;
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
if (false === ($elements = $xpath->query("//*"))) die('Error');

foreach ($elements as $element) {
    for ($i = $element->attributes->length; --$i >= 0;) {
        $name = $element->attributes->item($i)->name;
        if (('img' === $element->nodeName && 'src' === $name)
            || ('a' === $element->nodeName && 'href' === $name)
        ) {
            continue;
        }

        $element->removeAttribute($name);
    }
}

echo $dom->saveHTML();

?>

Now i want to merge these two .. i.e in the 1st class file before it stores the data to recode folder it should process this dom codes and then save that output to that recode folder .. Kindly guide me please

1 Answer 1

0

Try this solution:

<?php
class docxhtml
{
    /** @var string */
    private $tag;
    /** @var string */
    private $attribute;

    public $connectname;
    public $connectpass;

    public function __construct($format_res, $flname)
    {
        require_once('config.php');
        // Turn up error reporting
        error_reporting(E_ALL | E_STRICT);

        // Turn off WSDL caching
        ini_set('soap.wsdl_cache_enabled', 0);

        // Define credentials for LD
        define ('USERNAME', $this->connectname);
        define ('PASSWORD', $this->connectpass);

        // SOAP WSDL endpoint
        define ('ENDPOINT', 'https://api.livedocx.com/2.1/mailmerge.asmx?wsdl');

        // Define timezone
        date_default_timezone_set('Europe/Berlin');

        // Instantiate SOAP object and log into LiveDocx
        $this->soap = new SoapClient(ENDPOINT);

        $this->soap->LogIn(
            array('username' => USERNAME, 'password' => PASSWORD)
        );

        // Upload template
        $this->data = file_get_contents('Original/' . $format_res);

        $this->soap->SetLocalTemplate(
            array('template' => base64_encode($this->data), 'format' => 'docx')
        );

        $this->result = $this->soap->RetrieveDocument(
            array('format' => 'html')
        );

        $this->data = $this->result->RetrieveDocumentResult;

        $exceptions = array(
            'a'   => array('href'),
            'img' => array('src')
        );

        $this->stripAttributes($exceptions);

        file_put_contents('Recode/' . $flname . '.html', base64_decode($this->data));
    }

    public function stripAttributes(array $exceptions)
    {
        $dom = new DOMDocument();
        $dom->strictErrorChecking = false;
        $dom->formatOutput = true;
        $dom->loadHTML(base64_decode($this->data));

        $xpath = new DOMXPath($dom);
        if (false === ($elements = $xpath->query("//*"))) die('Xpath error!');

        /** @var $element DOMElement */
        foreach ($elements as $element) {
            for ($i = $element->attributes->length; --$i >= 0;) {
                $this->tag       = $element->nodeName;
                $this->attribute = $element->attributes->item($i)->nodeName;

                if ($this->checkAttrExceptions($exceptions)) continue;

                $element->removeAttribute($this->attribute);
            }
        }

        $this->data = base64_encode($dom->saveHTML());
    }

    public function checkAttrExceptions(array $exceptions)
    {
        foreach ($exceptions as $tag => $attributes) {
            if (empty($attributes) || !is_array($attributes)) {
                die('Attributes not set!');
            }

            foreach ($attributes as $attribute) {
                if ($tag === $this->tag && $attribute === $this->attribute) {
                    return true;
                }
            }
        }

        return false;
    }
}

If you need to add more attribute exceptions, just edit $exceptions array. For example, if you don't want strip title attribute in all "a" tags modify exceptions:

$exceptions = array(
    'a'   => array('href', 'title'),
    'img' => array('src')
);
Sign up to request clarification or add additional context in comments.

2 Comments

Great .. tested and works perfectly .. Thanks a lot man ,,, You are a life Saver :) Really Did timely help for me...
Kindly look at stackoverflow.com/questions/23210247/… Trying to reach you 2 days :-(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.