0

I have a 1st text file looks like this : 12AB34.US. The second text file is CD 34 EF. I want to find my 2nd text file exist or not in the 1st text file.

I tried to cut 3 characters last in the first text file (.US). Then I split to each 2 characters (because the 2nd text file consist of 2 characters). Then, I tried this code, and it always return "Not Found".

$String = Get-Content "C:\Users\te2.txt"
$Data = Get-Content "C:\Users\Fixed.txt"
$Split = $Data -split '(..)'

$Cut = $String.Substring(0,6)

$String_Split = $Cut -split '(..)'
$String_Split

$Check= $String_Split | %{$_ -match $Split}
if ($Check-contains $true) {
    Write-Host "0"
} else {
     Write-Host "1"
}
2
  • 1
    This is very unclear to me.. The only characters in text2 that are also in text1 are the numbers 34. Is that already enough for you to call it a 'match'? Commented Mar 5, 2019 at 10:16
  • Ya. But I am not sure another function to check the data in text2 exist in text1. @Theo Commented Mar 8, 2019 at 6:25

1 Answer 1

1

There are a number of problems with your current approach.

  1. The 2-char groups don't align:
    # strings split into groups of two
    '12'    'AB'    '34'        # first string
    'CD'    ' 3'    '4 '        # second string
  1. When you test multiple strings with -match, you need to

    1. escape the input string to avoid matchings on meta characters (like .), and
    2. place the collection on the left-hand side of the operator, the pattern on the right:

$Compare = $FBString_Split | % {$Data_Split -match [regex]::Escape($_)}
if ($Compare -contains $true) {
    Write-Host "Found"
} else {
     Write-Host "Not Found"
}

For a more general solution to find out if any substring of N chars of one string is also a substring of another, you could probably do something like this instead:

$a = '12AB34.US'
$b = 'CD 34 EF'

# we want to test all substrings of length 2
$n = 2

$possibleSubstrings = 0..($n - 1) | ForEach-Object {
    # grab substrings of length $n at every offset from 0 to $n
    $a.Substring($_) -split "($('.'*$n))" | Where-Object Length -eq $n |ForEach-Object {
        # escape the substring for later use with `-match`
        [regex]::Escape($_)
    }
} |Sort-Object -Unique

# We can construct a single regex pattern for all possible substrings:
$pattern = $possibleSubstrings -join '|'

# And finally we test if it matches
if($b -match $pattern){
    Write-Host "Found!"
}
else {
    Write-Host "Not found!"
}

This approach will give you the correct answer, but it'll become extremely slow on large inputs, at which point you may want to look at non-regex based strategies like Boyer-Moore

Sign up to request clarification or add additional context in comments.

1 Comment

It always found, even when I change the $b to : C34DEF . But my expectation $b will divide to each 2 characters when it process to compare with $a.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.