0

I've got this PHP script for invalidating files in the Amazon CloudFront CDN, which I want to automate.

Part of it uses XML, where the file paths are added.

$xml = <<<EOD
<InvalidationBatch>
    <Path>/index.html</Path>
    <Path>/blog/index.html</Path>
    <CallerReference>{$distribution}{$epoch}</CallerReference>
</InvalidationBatch>
EOD;

I want to replace this part with XML formatted output of a command like this:

find /srv/domain.com/wp-content/uploads/ -user www-data

This is to invalidate new image file uploads after they have been optimised using a cron script.

To further complicate matters, the path needs to only include from the wp-content directory onwards, so the XML would end up something like this:

$xml = <<<EOD
<InvalidationBatch>
    <Path>/wp-content/uploads/2014/02/ED_Wedluxe-CuveeRose-364x400.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINE_PROMOTION_1-165x213.jpg</Path>
    <Path>/wp-content/uploads/2014/02/ED_Wedluxe-CuveeRose-165x220.jpg</Path>
    <Path>/wp-content/uploads/2014/02/ED_Wedluxe-CuveeRose-371x495.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINE_PROMOTION_1-471x609.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINE_PROMOTION_1.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINES14-WEB_banner-794x4761-687x412.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINES14-WEB_banner-794x4761-300x180.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINES14-WEB_banner-794x4761.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINE_PROMOTION_1-150x150.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINES14-WEB_banner-794x4761-687x477.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINE_PROMOTION_1-110x142.jpg</Path>
    <Path>/wp-content/uploads/2014/02/ED_Wedluxe-CuveeRose-500x432.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINE_PROMOTION_1-624x432.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINES14-WEB_banner-794x4761-471x282.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINES14-WEB_banner-794x4761-150x150.jpg</Path>
    <Path>/wp-content/uploads/2014/02/VALENTINES14-WEB_banner-794x4761-364x400.jpg</Path>
    <Path>/wp-content/uploads/2014/02/ED_Wedluxe-CuveeRose-110x146.jpg</Path>
    <CallerReference>{$distribution}{$epoch}</CallerReference>
</InvalidationBatch> EOD;

I was talking to some people on IRC and someone suggested that I use something like this, instead of executing shell command through php:

<?php
$path     = isset($argv[1]) ? $argv[1] : './';
$owner    = isset($argv[2]) ? $argv[2] : 'www-data';
$iterator = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path));


$paths = array();


foreach ($iterator as $result) {
   $path = $result->getPath() . '/' . $result->getFilename();
   if (posix_getpwuid(fileowner($path))['name'] == $owner) {
      $paths[] = $path;
   }
}

However, whatever I have tried does not work.

3
  • Just to clarify, you want to list all of the files in a directory and then stick them into an XML file? And if so, can the web user read from this directory? And if so to that, are you needing to also get the file owner? Commented Feb 21, 2014 at 22:18
  • @Quixrick When new image files are uploaded through the WordPress admin interface, they are owned by www-data, so I think that is the best way to determine what files are new uploads. Once they are optimised and the unoptimised files are invalidated in the CND using this intended script, the cron job would change owner and modes. Commented Feb 21, 2014 at 22:41
  • Okay, great; thanks. I am working on a solution for you. Commented Feb 21, 2014 at 22:44

1 Answer 1

1

Okay, here you go. You were very close; I basically just left what you had and added a little more to it.

<?php

$path = isset($argv[1]) ? $argv[1] : './';
$owner = isset($argv[2]) ? $argv[2] : 'www-data';

$iterator = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path));


foreach ($iterator as $result) { 

    $path_info = $result->getPath().'/'.$result->getFilename();

    $owner_info_array = posix_getpwuid(fileowner($path_info));

    // CHECK TO SEE IF THE OWNER IS www-data AND THAT THE FILE NAME DOES NOT START WITH '/.'
    if (($owner_info_array['name'] == $owner) && (!preg_match('/\/\./', $path_info))){
        $path_info = preg_replace('/'.preg_quote('/srv/domain.com', '/').'/', '', $path_info);
        $path_array[] = $path_info;
    }

}


$xml = "<InvalidationBatch>";

foreach ($path_array AS $item) {
    $xml .= "\n    <Path>".$item."</Path>";
}

$xml .= "\n    <CallerReference>{$distribution}{$epoch}</CallerReference>
</InvalidationBatch>";



print $xml;

You'll, of course, want to replace /srv/domain.com with whatever the path part is you want to strip out. Let me know if that works for you.

Sign up to request clarification or add additional context in comments.

4 Comments

Okay, there are a couple of things going on here. The easy one is to declare the $path_array variable by putting $path_array = array(); somewhere at the top of your script. Under the $owner = ... line would be good. Second, it looks like you are not getting any lines back from your directory search. Check to make sure that the web user has permission to read the directory and that there are files in there owned by www-data. Do a print "\n$path_info"; after you define it to make sure stuff is being set there. It looks like it's not finding any files in there.
Cheers. I have been experimenting with it and it's nearly working. I have put the output and notes at the end. pastebin.com/rsinvaKk Thanks for your help!
Okay, I'm taking a look now.
Okay, meet me over at collabedit.com/h85w7 I can address each of these issues.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.