2

I use this code for matching two CSV file and get the columns i need in this code i compare the data Matricule name and Firstname and when I get a match I can retrieve the column 'IGG'

But it is very slow... (20min for 18 lines)

Someone can help me with this ?

Here is my code :

foreach ($item in $fileContentIMM) 
{
    try
    {
        $Matricule = $item.'Matricule'
        $name = $item.'Nom'
        $firstname = $item.'Prenom'

        # find first matching row in $$fileContentMagic using wildcard
        $objMatch = $fileContentMagic | where { $_.'Matricule' -eq $Matricule -and $_.'NOM' -eq $name -and $_.'PRENOM' -eq $firstname}


        ##### check if any match found 
        if ($objMatch -eq $null)
        {
            $item  | ForEach-Object {
                $filechecktrue += [pscustomobject]@{
                    'MATRICULE' = $item.'Matricule'
                    'IGG' = 'noSet'
                    'NAME'  = $item.'Nom'
                    'FIRSTNAME' = $item.'Prenom'
                    'SERVICE' = $item.'Service'
                    'Immeuble'= $item.'Immeuble' 
                    'Niveau' = $item.'Niveau'
                    'Loc.' = $item.'Loc.'
                    'PDT' = $item.'PDT'
                    'Occ.' = $item.'Occ.'
                    'Site' = $item.'Site'
                }
            }
        }
        else
        {
            $item  | ForEach-Object {
                $filechecktrue += [pscustomobject]@{
                    'MATRICULE' = $item.'Matricule'
                    'IGG' = ($objMatch.'IGG' -join '/')
                    'NAME'  = $item.'Nom'
                    'FIRSTNAME' = $item.'Prenom'
                    'SERVICE' = $item.'Service'
                    'Immeuble'= $item.'Immeuble' 
                    'Niveau' = $item.'Niveau'
                    'Loc.' = $item.'Loc.'
                    'PDT' = $item.'PDT'
                    'Occ.' = $item.'Occ.'
                    'Site' = $item.'Site'
                }
            }

        }
    }
    catch
    {
        "ERROR: Problem reading line - skipping :" | Out-File $LogFile -Append -Force
        $item.nom + $item.prenom + $item.service| Out-File $LogFile -Append -Force
    }
}
4
  • 1
    20 minutes to find 18 lines out of how many? Have you looked at Compare-Object? Commented Mar 29, 2018 at 16:10
  • 1
    Are you sure this is the slow part? How large are the two csv-files? Have you measured or used ex. Write-Host "import done" to determine that it's not reading of the file that is slow? Commented Mar 29, 2018 at 16:10
  • the file contentIMM contain 18 lines and the filecontentMagic 45000 Commented Mar 30, 2018 at 8:20
  • maybe i'm gonna check thanks Commented Mar 30, 2018 at 8:20

3 Answers 3

2

I would read the file you're using for lookups and then create a HashTable for that. HashTables are very efficient for doing lookups.

Try something like this, assuming you don't have any duplicates in in FileContentMagic:

# Use any character here which is guaranteed not to be present in the Matricule, Nom,
# or Prenom fields
$Delimiter = '|'

# Read the FileContent Magic into a HashTable for fast lookups
# The key is Matricule|Nom|Prenom
# The value is IGG joined with a forward slash
$FileContentMagic = @{}
Import-Csv -Path $FileContentMagicFileName | ForEach-Object {
    # Here we build our lookup key. The Trim() is just in case there's any leading or trailing
    # whitespace You can leave it out if you know you don't need it
    $Key = $_.Matricule.Trim(), $_.Nom.Trim(), $_.Prenom.Trim() -join $Delimiter

    # Since we only need the IGG value joined with a /, we'll just keep that
    $Value = $_.IGG -join '/'
    $FileContentMagic.Add($Key, $Value)
}

$FileContentIMM = Import-Csv -Path $FileContentIMMFileName

$FileCheckTrue = foreach ($item in $FileContentIMM) {
    $Key = $_.Matricule.Trim(), $_.Nom.Trim(), $_.Prenom.Trim() -join $Delimiter

    [PSCustomObject]@{
        'MATRICULE' = $item.'Matricule'
        'IGG'       = if ($FileContentMagic.ContainsKey($Key)) { $FileContentMagic[$Key] } else { 'noSet' }
        'NAME'      = $item.'Nom'
        'FIRSTNAME' = $item.'Prenom'
        'SERVICE'   = $item.'Service'
        'Immeuble'  = $item.'Immeuble' 
        'Niveau'    = $item.'Niveau'
        'Loc.'      = $item.'Loc.'
        'PDT'       = $item.'PDT'
        'Occ.'      = $item.'Occ.'
        'Site'      = $item.'Site'
    }
}

Also, any time you're using += to concatenate an array, you're introducing a significant performance penalty. It's worth it to avoid using it because each assignment creates a new array, copies the entire array over with the new item, and then discards the old array. It's very inefficient.

If $FileContentMagic contains duplicate keys, then you should change how the HashTable is loaded to:

$FileContentMagic = @{}
Import-Csv -Path $FileContentMagicFileName | ForEach-Object {
    $Key = $_.Matricule.Trim(), $_.Nom.Trim(), $_.Prenom.Trim() -join $Delimiter
    if (!$FileContentMagic.ContainsKey($Key)) {
        $Value = $_.IGG -join '/'
        $FileContentMagic.Add($Key, $Value)
    }
    else {
        $FileContentMagic[$Key] += '/' + ($_.IGG -join '/')
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

thanks i try this and i divided the time by 3 !but if i need more values than 'IGG' how can i do it ?
Change to $Value = $_ and update the rest to access the igg-property from the returned object from the hashtable. This is a simple change you should be able to fix yourself. I wouldn't recommend running code you don't understand.
1

I would simplify this, but the changes shouldn't affect the time to process much. The only optimization I've done is changed $filechecktrue to a List which is more memory-efficient.

Not sure if this is actually the slow part of your script. That would require $fileContentMagic to be a VERY large array.

$filechecktrue = New-Object System.Collections.ArrayList

foreach ($item in $fileContentIMM) 
{
    try
    {
        $Matricule = $item.'Matricule'
        $name = $item.'Nom'
        $firstname = $item.'Prenom'

        # find first matching row in $fileContentMagic using wildcard
        $objMatch = $fileContentMagic | Where-Object { $_.'Matricule' -eq $Matricule -and $_.'NOM' -eq $name -and $_.'PRENOM' -eq $firstname}

        #Create results object with common properties
        $o += [pscustomobject]@{
            'MATRICULE' = $item.'Matricule'
            'IGG' = 'noSet'
            'NAME'  = $item.'Nom'
            'FIRSTNAME' = $item.'Prenom'
            'SERVICE' = $item.'Service'
            'Immeuble'= $item.'Immeuble' 
            'Niveau' = $item.'Niveau'
            'Loc.' = $item.'Loc.'
            'PDT' = $item.'PDT'
            'Occ.' = $item.'Occ.'
            'Site' = $item.'Site'
        }

        ##### check if any match found 
        if ($objMatch)
        {
            #if not null, set IGG value. No need for foreach as $item is already a "foreach-value".
            $o.IGG = ($objMatch.'IGG' -join '/')
        }

        #Add result to arraylist
        $filechecktrue.Add($o)
    }
    catch
    {
        "ERROR: Problem reading line - skipping :" | Out-File $LogFile -Append -Force
        $item.nom + $item.prenom + $item.service| Out-File $LogFile -Append -Force
    }
}

1 Comment

yep this didn't affect the time to process thanks anyway ;)
0

Your first foreach returns a single $item-object on every iteration, so it's nonsense to again use a foreach on $item inside the code block (twice).

Try this (redundancy removed):

foreach ($item in $fileContentIMM) {
    try {
        # find first matching row in $fileContentMagic using wildcard
        $objMatch = $fileContentMagic | where { $_.'Matricule' eq $item.'Matricule'
                                           -and $_.'NOM' -eq $item.'Nom'
                                           -and $_.'PRENOM' -eq $item.'Prenom'}


        ##### check if any match found 
        if ($objMatch -eq $null) {
            $IGG = 'noSet'
        } else {
            $IGG = ($objMatch.'IGG' -join '/')
        }
        $filechecktrue += [pscustomobject]@{
            'MATRICULE' = $item.'Matricule'
            'IGG' = $IGG
            'NAME'  = $item.'Nom'
            'FIRSTNAME' = $item.'Prenom'
            'SERVICE' = $item.'Service'
            'Immeuble'= $item.'Immeuble' 
            'Niveau' = $item.'Niveau'
            'Loc.' = $item.'Loc.'
            'PDT' = $item.'PDT'
            'Occ.' = $item.'Occ.'
            'Site' = $item.'Site'

    } catch {
        "ERROR: Problem reading line - skipping :" | Out-File $LogFile -Append -Force
        $item.nom + $item.prenom + $item.service| Out-File $LogFile -Append -Force
    }
}

1 Comment

yep i save only a few seconds but you are right thanks :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.