0

I have a file DJ.bat file. I am reading the content of file using powershell.

DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_2554_20220323065246.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_2554_20220323065246.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"
DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_5168_20220323074029.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_5168_20220323074029.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"
DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"

I have thousand of -DJENGINE lines of statement being printed. I have just mentioned three here

I have an array $data which has thousands of data. I just mentioned two here:

TMSfuture_cost_2554_20220323065246.dat
TMSfuture_cost_5168_20220323074029.dat

I want result that the content of array are compared within the content of file being read. If the ,content of array matches then I need to delete block of code from :

   " DJEngine.... to .... tf.xml"

Since these two arrays matches with file content, My expected output is:

DJENGINE -l "Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.tf.err" -sc "File=Y:\st\Retail\FTP\2022-03-23\TMSfuture_cost_13272_20220323070111.dat" -tc "Server=ABC.stg.sql.ccaintranet.com;Database=ABCTMS_FC;Table=dbo.FC_Load" "\\ccaintranet.com\dfs-dc-01\Data\Retail\Actian11\MapDesigner\fc.tf.xml"

I tried using:

$pathOFDj='Y:\St\Retail\FTP\2022-03-23\DJ.bat'
foreach($d in $data){
  foreach($line in Get-Content $pathOFDj) {
    if($line -contains $d){

    }
    else{
      $newLine+=$line
    }
  }
}

echo $newLine

The block which I mentioned is not being removed.

1 Answer 1

1

Your current script has 3 major problems:

  • Wrong comparison operator - -contains is for testing collection containment, not substrings - you'd want something like -like or -match for string comparisons instead
  • Repeated positives - even if one string from $data is found in a specific line, the line will still be copied/included on the next iteration of the outer loop because it won't contain the remaining $data substrings
  • Subquadratic time complexity - testing every single item in $data against every single line in the file gives your script a bounding time complexity of O(N*M) where N is the number of $data items and M is the number of lines in the file. This means your code is going to get significantly slower when you increase the input size. By structuring your code differently this can be improved somewhat, and by structuring you data differently this can be improved massively

Instead of attempting to solve these problems point-by-point, I'm gonna show you how to prepare the $data array and parse the input file for better performance (and correctness of course).

This will consist of two steps:

  • Organize the $data items into a data structure that allows for constant-time lookups - something that can tell us, as quickly as possible, whether a specific string is part of the collection or not.
  • Parse the relevant file name out of each line in the file, use the extracted file name to test if the collection from the previous set contains it, and use that to filter out the relevant line
$pathOFDj='Y:\St\Retail\FTP\2022-03-23\DJ.bat'

# Read file names into array
$data = Get-Content path\to\listOfFileNames.txt

# Create a hashset - this will provide super-fast lookups
$fileNameSet = [System.Collections.Generic.HashSet[string]]::new([System.StringComparer]::CurrentCultureIgnoreCase)
$data |ForEach-Object { [void]$fileNameSet.Add($_) }

# Now we can start parsing and filtering the file
Get-Content $pathOFDj |Where-Object {
  # attempt to extract file name, then use the extracted file name to test if it's one of the relevant file names
  -not($_ -match '-sc "File=[^"]+?\\([^\\"]+)"' -and $fileNameSet.Contains($Matches[1]))
} |Set-Content path\to\modified_dj.bat

The statement -not($_ -match '-sc "File=[^"]+\\([^\\"]+)"' -and $fileNameSet.Contains($Matches[1])) will only evaluate to $false if a filename was successfully extracted and found to be contained in the hashset - otherwise, it'll evaluate to $true, and Where-Object will let the line filter through as expected.

Sign up to request clarification or add additional context in comments.

8 Comments

I dont see $data being used. where is it being computed?
@DikshitKarki Correct, I forgot to include the line that populates the set, I've just updated the answer :)
i checked into the modified_bat file and still see the filename which was meant to be deleted
i have considered $data as arraylist in my code. I have added all the filename using $data.add($l). I think regex is not matching. The files are still showing in modified_dj file
@DikshitKarki Did you try the updated version (with the $data |ForEach-Object { ... } statement)? If I use the two file names you've posted as the contents of $data it works and only spits out the last line of the file
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.