1

Can someone suggest a script to create a xml representation of all the files, alongwith the ones in the sub-directory(on Windows) by file-type. For example, if the current directory name is mypics, then for all jpg's

<?xml version="1.0" encoding="utf-8"?>
<images xmlns="http://mydomain.com/images" version="1.0">
  <image>
      <big_url>myassets/pics/funnypics/big_pics/down.jpg</big_url>
  </image>
  <image>      
      <big_url>assets/pics/funnypics/big_pics/spider.jpg</big_url>
  </image>  
</images>

then for pdf's

<?xml version="1.0" encoding="utf-8"?>
<pdfs xmlns="http://mydomain.com/pdf" version="1.0">
  <pdf>
      <big_url>myassets/pics/funnypics/big_pics/down.pdf</big_url>
  </pdf>
  <pdf>      
      <big_url>assets/pics/funnypics/big_pics/spider.pdf</big_url>
  </pdf>  
</pdfs>

since, number of filetype is unlimited, i can extend the suggested script

3 Answers 3

2

This should get you started:

#!/usr/bin/perl
use warnings;
use strict;
use File::Find;
use XML::Simple;

my $dir = shift || '.';

my %files;
find \&by_extension, $dir;
print XMLout \%files;

sub by_extension {
    return if /^\./;                    # skip dotfiles
    return unless -f;                   # skip non-files
    return unless /\.([^.]+)$/;         # skip if no filename extension
    my $ext = lc $1;                    # ignore case
    $File::Find::name =~ s#^\Q$dir/##;  # trim starting directory name
    push @{$files{$ext . '_files'}{$ext}}, $File::Find::name;
}
Sign up to request clarification or add additional context in comments.

Comments

1

The following perl code will do the trick (maybe minus the XML file indentation):

package FilesToXml;
use IO;
use File::Find;
use XML::Writer;

use vars qw(@ISA @EXPORT @EXPORT_OK);
require Exporter;
@ISA = qw(Exporter);
@EXPORT = qw(SetRequestedType GenerateXml);

my $group_name = "";
my $file_type = "";
my $ext = "";
my $writer = "";

sub SetRequestedType
{
    $group_name = shift;
    $file_type = shift;
    $ext = shift;
}

sub wanted
{
    if ($File::Find::name =~ /\.$ext$/)
    {
        $writer->startTag($file_type);
        $writer->startTag('big_url');
        $writer->characters($File::Find::name);
        $writer->endTag();
        $writer->endTag();
    }    
}

sub GenerateXml
{
    my $filename = shift;
    my $directory = shift;

    my $output = new IO::File(">$filename");
    $writer = new XML::Writer( OUTPUT => $output );

    $writer->xmlDecl( 'UTF-8' );
    $writer->startTag( $group_name, 'xmlns' => 'http://mydomain.com/'.$group_name, 
                        'version' => '1.0' );
    find(\&wanted, $directory);
    $writer->endTag();  
}

package main;

FilesToXml::SetRequestedType('docs', 'doc', 'docx');
FilesToXml::GenerateXml("output.xml", ".");

You basically need to call SetRequestedType with the type of the files group, the string describing a single file and the file extension. Then you just need to call GenerateXml with the XML output file name and the directory to search under.

It works using ActivePerl on Windows. May need some minor adjustments in other environments.

Comments

1

Without knowing Perl and it's methods for reading directories or handling XML this is a bit of pseudocode you could use as a template:

strFileExtensionToMap="jpg"
strNodeName="image"
strCollectionName="images"
currentXMLNode=XML.CreateElement(strCollectionName)
StartFolder=Filesystem.GetFolder([however to get folder])
Call RecursiveMapContents(StartFolder)


RecursiveMapContents(folder){
    For each file in folder.Files
    {
        if (file.extension=strFileExtensionToMap)
        xmlFile=XML.CreateElement(strNodeName)
        big_Url=XML.CreateElement("big_url)
        big_url.text=file.path
        xmlFile.AppendChild(big_url)
        currentXMLNode.AppendChild(xmlFile)
    }

    For each subFolder in folder.Folders
    {

        call RecursiveMapContents(subFolder)
    }
}

Of course, you could make the XML more generic by using file type as an attribute of a file element:

<file type="image"/>

You could also map the actual nested directory structure by using

<folder name="foldername" path="folderpath"> instead of <images>

Then you could include the current folderNode in your call to RecursiveMapContents, so that files and subfolders were nested within it, giving you:

<folder name="foldername" path="folderpath">
    <file type="image">
        <big_url>file path</big_url>
    </file>
    <file type="image">
        <big_url>file path</big_url>
    </file>
    <folder name="foldername" path="folderpath">
        <file type="image">
            <big_url>file path</big_url>
        </file>
        <file type="image">
            <big_url>file path</big_url>
        </file>
    </folder>
</folder>

I didn't include the namespaces, though I'll admit to being somewhat mystified as to why you'd want separate namespaces for images and pdfs. The point of a namespace is to provide unique naming for a set of elements (so someone else's image element isn't confused with your image element should you want to work with their XML). If you really need a namespace at all then "http://mydomain.com" should be enough for all your element names. A namespace says "this element, which we use shorthand image for is actually called thisnamespace:image". So unless you have two types of image element (one in pdfs, the other in images) and they aren't equivalent the single namespace is enough.

There's also a lot more you could do to make your XML more generic, and possibly less verbose. It's largely up to whoever designs the XML format to specify whether something like a filepath should be an attribute of a file element or a child element (like your big_url), it depends on whether the data needs to be qualified (e.g. filepath="this filepath" type="filesystem|http" should use a child element).

Sorry it's not a Perl answer, but I hope it helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.