3

I have a simple PowerShell script that runs through a directory tree, and lists the files in JSON format.

Each entry is of the form:

{id: filename, size: bytes }

Works fine for short listings, but very slow for large directories. I also want to write the contents to a file (manifest.json).

I am much better at writing C# .NET (I would use Directory.EnumerateFiles() )

But I thought I would see if I can't get simple things done easier in powershell.

But this script really bogs down when I get to 10K entries.

$src = "G:\wwwroot\BaseMaps\BigBlueMarble"
$path = $src + "\*"
$excludes = @("*.json", "*.ps1")
$version = "1.1"
Write-Host "{" 
Write-Host "`"manifest-version`": `"$version`","
Write-Host "`"files`": [" 

$dirs = Get-Item -Path $path -Exclude $excludes 
$dirs | Get-ChildItem -Recurse -File | % { 
    $fpath = $_.FullName.Replace($src, "").Replace("\","/")
    $date = $_.LastWriteTime
    $size = $_.Length
    $id = $_.BaseName
    Write-Host "{`"id`": `"$id`", `"size`": `"$size`"},"
    } 
Write-Host "]"
Write-Host "}"
5
  • Get-ChildItem is slow. Better stick with C#/.net for this. See this answer to a similar question. Commented Jun 26, 2013 at 14:46
  • Have you measured the performance of each step to see where the bottleneck is? How many files are in a "large directory"? Do you have PowerShell 3 available (which includes Convertto-JSON which may be faster than string concatenation)? Commented Jun 26, 2013 at 16:16
  • I think that powershell version / OS version is significant here. What versions are you running? (see my comment to user2460798's post) Commented Jun 27, 2013 at 2:54
  • I am running PowerShell 3.0 on a win7 OS. Alienware 4-core Area51. Things gett slow at 5K file entries, I have some 10K folders, and one that is up to 50K. Yes, that is a lot for NTFS, but I don't control that part of the architecture, indeed the JSON index is what I am using to solve the issue (by loading it into a Couchbase DB). Commented Jun 27, 2013 at 15:20
  • The 10K files in a folder takes about 8 hours, the 50K runs over the weekend. So I strongly leaning to going to the DirectoryInfo class and using: public IEnumerable<FileInfo> EnumerateFiles() Commented Jun 27, 2013 at 15:26

3 Answers 3

2

Get-ChildItem may be slowish (though it appears to be about twice as fast in PowerShell 3 as it was in v2), write-host is slowing you down a lot too. On a directory structure containing 27000+ files, the following code ran in 16.15 seconds vs 21.08 seconds for your code. On a smaller directory containing about 2400 files, it was 1.15s vs 1.22s.

gci $path -file -Recurse |
select @{name="fpath";expression={$_.fullname.replace($src,"").replace("\","/")}},lastwritetime,@{Name="size";Expression={$_.length}},@{Name="id";Expression={$_.basename}}|
select id,size|
ConvertTo-Json

The resulting JSON doesn't have the header yours does, but you should be able to handle that after the fact.

Sign up to request clarification or add additional context in comments.

Comments

1

On my system:

$pf = "C:\Program Files" # has about 50,000 files
measure-command {$a=[io.Directory]::EnumerateFiles($pf,"*","AllDirectories")|%{$_}}

was about twice as fast as:

measure-command {$a=gci "C:\Program Files" -Recurse}

The point being that you can use .NET classes very easily with Powershell AND they may work better.

In this case the get-childitem command has its own .NET class(es) to execute as well as invoking the file system provider class(es) which no doubt call something in [io.directory]. So while the powershell provider concept is pretty cool, it does add runtime overhead.

2 Comments

It's ironic that on my machine measure-command {$a=gci "C:\Program Files" -Recurse} is actually faster. (1s versus 3s for the EnumerateFiles) This is powershell 3 on Windows 8.
That is surprising. Blows my explanation out of the water. I'm running PS v3 in a Win7 VM on WS08, using VirtualBox. The gci ranged from 8.6 to 9.4s (4 runs) and the other 3.25 to 4.5s over 5 runs.
1

Sometimes it might be better to just write utilities in C# and .NET. Using a very handy JSON.NET library, I put together a WPF application, that lets me select a folder (One of them has 100K PNG files) and then create the json "manifest" I tried above in less than 2 seconds. Here is the non-UI worker part of the application. Thanks for the tips above, they were helpful.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.IO;
using System.Windows;
using Newtonsoft.Json;

namespace Manifest
{
    internal class Worker
    {
        private DateTime start;
        private ViewModel vm;
        private readonly BackgroundWorker worker = new BackgroundWorker();
        private ManifestObject manifest;

        public Worker()
        {
            vm = ViewModel.myself;
            manifest = new ManifestObject();
            manifest.version = "1.1";
            manifest.files = new List<FileData>();
            worker.DoWork += build;
            worker.RunWorkerCompleted += done;
            worker.RunWorkerAsync();
        }

        public void build(object sender, DoWorkEventArgs e)
        {

            vm.Status = "Working...";
            start = DateTime.Now;
            scan();
        }

        private void scan()
        {
            var top = new DirectoryInfo(vm.FolderPath);
            try
            {
                foreach (var fi in top.EnumerateFiles("*" + vm.FileType, SearchOption.TopDirectoryOnly))
                {
                    FileData fd = new FileData();
                    fd.size = fi.Length;
                    fd.id = fi.Name.Replace(vm.FileType, "");
                    manifest.files.Add(fd);
                    vm.FileCount++;
                }
            }
            catch (UnauthorizedAccessException error)
                    {
                        MessageBox.Show("{0}", error.Message);
                    }
        }

        private void done(object sender,RunWorkerCompletedEventArgs e)
        {
            var done = DateTime.Now;
            var elapsed = done - start;
            vm.ElapsedTime = elapsed.ToString();
            vm.Status = "Done Scanning...";
            write();
        }

        private void write()
        {
            File.WriteAllText(vm.FolderPath + @"\manifest.json", JsonConvert.SerializeObject(manifest, Formatting.Indented));
            vm.Status = "Done";
        }
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.