1

I need to remove ONLY the TABs from within double quotes from a TAB delimited file. I do not know how to detect double quotes that are actual fields vs double quotes in the string itself. I don't want to accidentally remove actual TAB field delimiters (of course).

I have a skeleton script where I believe I need to do some work with the $_ object/record but I don't know what the next step is to detect all fields to then scrub them of any TABs without removing actual Delimiters. I placed a comment in the CODE snippet stating where I believe the processing of the data fields should occur.

Raw data

_record_number  record_id   id_testing  Notes   IntakeComplete
111 6   5   "           We will not be testing because of covid 19" 1
222 6   5   "           We will not be "testing because of covid 19"    1
333 6   5   "           We will not be "testing" because of covid 19"   1

The easiest case to solve for is the _record_number 111. I had one record with one single double quote inside two double quotes within tabs. Really odd, and so I replicated that issue in _record_number 222.

The Code

#-Import TABs Inside Quotes Issue.tsv

#-Declare File Paths
$SourceFile = "TABs Inside Quotes Issue.tsv"
$ExportFile = "TABs Inside Quotes Issue_Updated.tsv"


#---------- Attempt to find Double Quoted Fields and Remove TABs ONLY within them and therefore leaving the TAB delimiters untouched

Import-Csv $SourceFile -Delimiter "`t" | 
ForEach-Object {
    #-Process Data Fields Here
    $_
} |
Export-Csv $ExportFile -Delimiter "`t" -UseQuotes Never -NoTypeInformation

Data Before and After Export-Csv Update

TL;DR

The issue I'm having with using the above code to fix my data issues is that I am fixing each issue for each field as they come up. How do I apply a standard fix to ALL fields? Meaning, how do write the script to check the entire Row of columns as opposed to coding in each column and then checking the data?

If you're curious why am I using -UseQuotes Never in the Export-Csv I am doing that due to the fact that BCP requires all of the quotes to be removed due to the BCP Utility not able to handle quoted fields. Using the -UseQuotes Never parameter/value when exporting a CSV via PowerShell Export-Csv Cmdlet it removes just about every single quote but it does this without first removing any Delimiter characters from within the Quoted fields. It also produces unexpected results at times for Quoted fields containing quotes within. Regardless the issue that is causing me grief are the TABs within Quoted Fields that are themselves delimited via TABs.

6
  • 2
    Could you please sahre the raw data from your CSV file formatted as code? Thanks in advance. Commented Jul 18, 2023 at 23:39
  • perhaps -replace '(?<=")\t+' might help but as Olaf noted, its better to have the raw data for testing Commented Jul 18, 2023 at 23:51
  • The point of text delimiters (i.e. quotes) is that you can put a field delimiter inside it (i.e. tab) and it will be considered data, not a field delimiter. Is this the root cause for this? So something like this is meant to correctly define three columns of data. "abc"{tab}"def{tab}ghi"{tab}"kjl\"mno" The second column contains a tab, the third column contains a quote Commented Jul 19, 2023 at 2:03
  • @Olaf I pasted in the raw csv data. Commented Jul 19, 2023 at 3:10
  • Just as an aside if you are transferring SQL Server > SQL Server via BCP just use native format. Commented Jul 20, 2023 at 7:01

1 Answer 1

2

There are 2 alternatives you could use, the more robust one is to parse the Tsv as you have it then update the values in that column:

Import-Csv $SourceFile -Delimiter "`t" | ForEach-Object {
    # update the property value
    $_.Notes = $_.Notes.Trim()
    # output the updated object
    $_
} | Export-Csv ....

The second option is to read the content as plain text and trim tabs that are directly after the " and lastly parse the strings into objects:

(Get-Content $sourcefile -Raw) -replace '(?<=")\t+(?=.+")' |
    ConvertFrom-Csv -Delimiter "`t" |
    Export-Csv ...

See https://regex101.com/r/W1lJgx/1 for regex details.

I recommend you to not use -UseQuotes Never because as noted by in comments by another user, the quotes will serve as a delimiter when reading the Tsv back. In which case, wouldn't need to ConvertFrom-Csv and just Set-Content back:

(Get-Content $sourcefile -Raw) -replace '(?<=")\t+(?=.+")' |
    Set-Content $ExportFile
Sign up to request clarification or add additional context in comments.

3 Comments

Yes as I mentioned there are issues with using -UseQuotes Never however due to the bcp import utility not able to read or understand quoted fields I need them removed. However as mentioned I cannot simply just remove the quotes due to the potential of TABs being inside some of these quoted fields. This is why I posted to see if there was a way I could remove the TABs from within these fields prior to export which is when I need to remove all quotes around the fields in order to import the data using bcp.
I see you are using $_.Notes.Trim(). I am very aware of these functions with SQL but being new to PowerShell it is good to see how to do these things prior to import. How about using instead a replace. If there are TABs within the string those would also become an issue after removing the double quotes. $_.Notes.replace("t", ''). Or it appears syntax also can look like $_.Notes -replace '"t",'''
In the short term I have applied $_.Notes = $_.Notes -replace " t",'' -replace '\"','' to remove all tabs within this one single field and while I was in there I'm also removing any Double Quotes within the field... Now when using -UseQuotes Never it correctly removes the wrapped quotes. Again I MUST use -UseQuotes Never. If I do not it wraps EVERY SINGLE FIELD WITH DOUBLE QUOTES. BCP hates wrapped fields and so this breaks everything if I do not do that. I plan to look further into your second suggestion. For now you saved me my job. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.