2

I need to build a process that includes reading a CSV file into a SQL Server table. In the past I’ve used SQL Bulk Insert, but the CSV files I’m working with will have commas within some fields (unpredictably) and my sql server version (2016) does not support handing those (I gather that with 2017, the FIELDQUOTE parameter addresses this).

This post talks about this very issue, and turned me on to this method of using Powershell with Import-CSV. My dataset with each run will be very small, so the “Simple CSV Import Using PowerShell” should be fine. However I’m getting hung up on data types.

When I use this method below, I get errors because it can’t convert string to decimal or datatime. However if I configure my sql table up to only contain string type fields, the data imports correctly. It seems to ignore the header row. I understand that Import-CSV will use the first row as headers by default, but what I get in my destination table (when all the fields are set to varchar(n)) is only rows 2 and beyond.

Is there a way for me to use this method but set the fields to their proper formats so they can go into the properly defined sql table? If I had to, I could work with it as I have it working (all destination columns set to varchar(n)), but I’d prefer that the sql columns were properly defined. My CSV header names are predictable, and I know what I want each column data type to be. Many will be left as a string, but some need to be changed to decimal(19,4) or datetime. I think I’m reading in other places where I can pipe in a where-object command or something, but I’m a PS novice and struggling to find the proper syntax.

$ConnectionString = "Data Source=<myServer>; Database=<myDatabase>;Trusted_Connection=True;";
. "E:\Out-DataTable.ps1"

$csvDataTable = Import-CSV -Path "<sourceCsvFile>" | Out-DataTable
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $ConnectionString
$bulkCopy.DestinationTableName = "<myDestinationTable>"
$bulkCopy.WriteToServer($csvDataTable)
3
  • Is there a specific format for dates in the CSV? Commented May 2, 2020 at 18:27
  • Please consider upvoting any answers you find helpful, or accepting an answer (via the checkbox) if it solves your problem. Commented May 3, 2020 at 15:18
  • Format for my date / time fields: 4/17/2020 20:32. Thanks for the reply. I will play with it and report. Commented May 3, 2020 at 17:30

1 Answer 1

2

Regarding your main question: "Is there a way for me to use this method but set the fields to their proper formats so they can go into the properly defined sql table?" Yes.

You could cast the values to their desired types before creating the DataTable (Example code below). Personally, I'd probably try to replace the CSV files with something that encodes type information, such as JSON or parquet.


UPDATE BASED ON OP'S QUESTIONS:

Would I have to spell out each field in that loop, or just the ones that need to change from string to something else? You'd have to map each field to the [PSCustomObject], but you only need to cast the fields containing data intended to be non-string.

For column headers that have spaces, do I surround them in brackets when referring to them in PS (such as [Created on Date]) I adjusted the answer to illustrate one way you could handle column headers with spaces. See column 'e e'.

I see your example converts 0.2 type numbers to "double". My decimal data will be currency and will work with other data such that I would be setting it to decimal. Do you expect this will flow properly, or is there a different proper syntax for converting to decimal? Based on the reference below, .NET's decimal type should map to SQL Server's decimal type. See property f in the example data. https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-data-type-mappings



Assuming you have the below data in a CSV

a,b,c,d,e e,f
string1,1,0.1,2020-05-02 14:29:06.780,e,1.99
string2,2,0.2,2020-05-03 01:01:02.000,e,3.59


You could cast the value to the desired types using an approach similar to the one in the foreach loop below. Obviously, you'd have to customize that to fit your actual data.

param(
    $datasource = 'localhost',
    $db = 'test',
    $destTable = 'table',
    $csvPath = "$PSScriptRoot\data.csv"
)

$ConnectionString = "Data Source=$datasource; Database=$db;Trusted_Connection=True;";
. "$PSScriptRoot\Out-DataTable.ps1"

$rawData = Import-CSV -Path $csvPath
$typedData = @()
foreach ($row in $rawData) {
    $tempObj = [PSCustomObject] @{
        a = $row.a
        b = [int] $row.b
        c = [double] $row.c
        d = [datetime] $row.d
        e_e = $row."e e"
        f = [decimal] $row.f
    }
    $typedData += $tempObj
}

$csvDataTable = $typedData | Out-DataTable
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $ConnectionString
$bulkCopy.DestinationTableName = $destTable
$bulkCopy.WriteToServer($csvDataTable)

Contents of Out-DataTable.ps1 (for future readers)

####################### 
function Get-Type { 
    param($type) 

    $types = @( 
        'System.Boolean', 
        'System.Byte[]', 
        'System.Byte', 
        'System.Char', 
        'System.Datetime', 
        'System.Decimal', 
        'System.Double', 
        'System.Guid', 
        'System.Int16', 
        'System.Int32', 
        'System.Int64', 
        'System.Single', 
        'System.UInt16', 
        'System.UInt32', 
        'System.UInt64') 

    if ( $types -contains $type ) { 
        Write-Output "$type" 
    } 
    else { 
        Write-Output 'System.String' 

    } 
} #Get-Type 

####################### 
<# 
.SYNOPSIS 
Creates a DataTable for an object 
.DESCRIPTION 
Creates a DataTable based on an objects properties. 
.INPUTS 
Object 
    Any object can be piped to Out-DataTable 
.OUTPUTS 
   System.Data.DataTable 
.EXAMPLE 
$dt = Get-psdrive| Out-DataTable 
This example creates a DataTable from the properties of Get-psdrive and assigns output to $dt variable 
.NOTES 
Adapted from script by Marc van Orsouw see link 
Version History 
v1.0  - Chad Miller - Initial Release 
v1.1  - Chad Miller - Fixed Issue with Properties 
v1.2  - Chad Miller - Added setting column datatype by property as suggested by emp0 
v1.3  - Chad Miller - Corrected issue with setting datatype on empty properties 
v1.4  - Chad Miller - Corrected issue with DBNull 
v1.5  - Chad Miller - Updated example 
v1.6  - Chad Miller - Added column datatype logic with default to string 
v1.7 - Chad Miller - Fixed issue with IsArray 
.LINK 
http://thepowershellguy.com/blogs/posh/archive/2007/01/21/powershell-gui-scripblock-monitor-script.aspx 
#> 
function Out-DataTable { 
    [CmdletBinding()] 
    param([Parameter(Position = 0, Mandatory = $true, ValueFromPipeline = $true)] [PSObject[]]$InputObject) 

    Begin { 
        $dt = new-object Data.datatable   
        $First = $true  
    } 
    Process { 
        foreach ($object in $InputObject) { 
            $DR = $DT.NewRow()   
            foreach ($property in $object.PsObject.get_properties()) {   
                if ($first) {   
                    $Col = new-object Data.DataColumn   
                    $Col.ColumnName = $property.Name.ToString()   
                    if ($property.value) { 
                        if ($property.value -isnot [System.DBNull]) { 
                            $Col.DataType = [System.Type]::GetType("$(Get-Type $property.TypeNameOfValue)") 
                        } 
                    } 
                    $DT.Columns.Add($Col) 
                }   
                if ($property.Gettype().IsArray) { 
                    $DR.Item($property.Name) = $property.value | ConvertTo-XML -AS String -NoTypeInformation -Depth 1 
                }   
                else { 
                    $DR.Item($property.Name) = $property.value 
                } 
            }   
            $DT.Rows.Add($DR)   
            $First = $false 
        } 
    }  

    End { 
        Write-Output @(, ($dt)) 
    } 

} #Out-DataTable
Sign up to request clarification or add additional context in comments.

6 Comments

Note that using += to iteratively build arrays is inefficient, and will be expensive with progressively larger datasets.
Derek, would I have to spell out each field in that loop, or just the ones that need to change from string to something else (they start as strings by default using the Import-Csv). For column headers that have spaces, do I surround them in brackets when referring to them in PS (such as [Created on Date]). Finally, I see your example converts 0.2 type numbers to "double". My decimal data will be currency and will work with other data such that I would be setting it to decimal. Do you expect this will flow properly, or is there a different proper syntax for converting to decimal? Thanks!
@TimMiller, I updated the answer in response to the questions in your most recent comment.
Thanks for this. Several of my datetime fields will be blank for many of the rows, and this causes a problem. (Cannot convert value "" to type "System.DateTime". Error: "String was not recognized as a valid DateTime.") Received several responses like that. That will also be true for many of the decimal fields, but I did not receive any errors relating to converting to decimal. After several of those, I see many of these, which I assume are only a result of the first problem (Out-DataTable : Cannot bind argument to parameter 'InputObject' because it is null.).
@TimMiller, you could do something like this to handle that condition: d = if ($null -eq $row.d -or $row.d -eq '') { Get-Date -Year 0001 -Month 01 -Day 01 } else { [datetime] $row.d } You'll probably want to update the params passed to Get-Date to handle the missing values. Powershell 5.1 and under doesn't have a proper ternary operation, but this is basically the same.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.