1

I need an advice how to upload a large csv file (about one minion lines) into a Azure table storage with PowerShell. I am aware about Add-AzTableRow -table $using:table -partitionKey $partitionKey -rowKey $rowKey -property $tableProperty

https://github.com/paulomarquesc/AzureRmStorageTable/tree/master/docs

Based on tests with a smaller file with 10 000 lines it takes about 10 min to upload, processing sequentially. (It should be about 16-20 hours for 1 000 000 lines)

I have tried the smaller file (10 000 lines) with PowerShell 7 to use Foreach-Object -Parallel, but the funny thing is that it takes about 3 times more. From three tests, two uploaded 10 000 lines of file for about 30 mins, one for 47 mins.

Just for comparison, I took less than an hour to upload the one minion lines file with Storage Explorer! So I was wondering what is the process they are using and could it be used with PowerShell?

I have reviewed the following article:

https://blog.kloud.com.au/2019/02/05/loading-and-querying-data-in-azure-table-storage-using-powershell/

Which should be working, but it returns an error with: $table.CloudTable.ExecuteBatch($batchOperation)

So my question would be: Is there any way to load data into Azure table storage in parallel?

As requested, adding the code used.

Note: Code works just fine, but it takes time and I believe It could be faster. Looking for suggestions how to improve.

$SaContext = (Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccountName).Context
$tableName = 'Table1000000'
New-AzStorageTable –Name $tableName –Context $SaContext
$table = (Get-AzStorageTable -Name $tableName -Context $saContext).CloudTable
$CSVTable = Import-Csv -Path C:\TEMP\DEMO_CSV\Test-1000000-20200513223059.csv
$i=0
$totalcount = $CSVTable.count

foreach ($entry in $CSVTable) {
    $i++
    Write-Progress -Activity "Importing Data" -Status "Item: $i of $totalcount" -PercentComplete (($i/$totalcount)*100);
    $partitionKey = $entry.ID
    $rowKey = $entry.UserName
    $tableProperty = @{
        'SISID' = $entry.SISID
        'FirstName' = $entry.FirstName
        'LastName' = $entry.LastName
        'Password' = $entry.Password
        'StateID' = $entry.StateID
        'StudentNumber' = $entry.PersonalNumber
        'MiddleName' = $entry.MiddleName
        'Grade' = $entry.Grade
        'Status' = $entry.Status
        'BirthDate' = $entry.BirthDate
        'GradYear' = $entry.GradYear
    }
    Add-AzTableRow -table $table -partitionKey $partitionKey -rowKey $rowKey -property $tableProperty
}
5
  • 1
    Please edit your question and include the code you're using for doing entity batch transaction. Also include the exact error you're getting. Commented May 15, 2020 at 6:40
  • 1
    If possible, share some sample data that you're trying to upload. Commented May 15, 2020 at 6:42
  • @GauravMantri-AIS There is no error, it works just fine! I am asking for advice how to optimize, as there should be such. Commented May 15, 2020 at 7:35
  • What do you mean when you say Which should be working, but it returns an error with: $table.CloudTable.ExecuteBatch($batchOperation)? Commented May 15, 2020 at 7:46
  • @GauravMantri-AIS The author of the article present it as a solution, and claims it works, quoting: With my Storage Account being in Central US and myself in Sydney Australia loading the ~26k entries took 4 mins 27 seconds to insert. When I reproduced the exact same code, it fails on the error specified. I am not trying to troubleshoot this issue, but obviously there are other ways to upload data to table storage, so I am looking for advice from someone who might faced such challenge. Commented May 15, 2020 at 7:56

2 Answers 2

2

According to the script you provide, you use the command Get-AzStorageTable -Name $tableName -Context $saContext).CloudTable to get a CouldTable Object. Its type is Microsoft.Azure.Cosmos.Table.CloudTableClient. enter image description here So if we want to execute batch operations with the client, we need to set the type of batch operations as Microsoft.Azure.Cosmos.Table.TableBatchOperation. Besides, please note that all entities in a batch must have the same PartitionKey. If your entities have different PartitionKeys, they need to be in separate batches

For example

[Microsoft.Azure.Cosmos.Table.TableBatchOperation] $batchOperation = New-Object -TypeName Microsoft.Azure.Cosmos.Table.TableBatchOperation

$e = New-Object Microsoft.Azure.Cosmos.Table.DynamicTableEntity("Hury","test1")
$e.Properties.add("Age","20")
$e1 = New-Object Microsoft.Azure.Cosmos.Table.DynamicTableEntity("Hury","test2")
$e1.Properties.add("Age","25")
$batchOperation.InsertOrReplace($e)
$batchOperation.InsertOrReplace($e1)
$table.CloudTable.ExecuteBatchAsync($batchOperation)

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks you very much for your suggestion. I will test it and will get back to you. Thanks again for taking the time to answer! :)
1

First of all, all credit goes to Jim Xu!! Again, thanks for the support!

I would like to add a couple of comments from my side:

The article quoted, looks like has been changed and the script for uploading is missing.

The issue was, that I was using different object type (as included in the quoted article): [Microsoft.WindowsAzure.Storage.Table.TableBatchOperation]

While I had to use: [Microsoft.Azure.Cosmos.Table.TableBatchOperation]. This is why I was receiving an error, when trying to execute batch: $table.ExecuteBatch($batchOperation).

Adding a working sample script for future references.

N.B. The sample provided from Jim Xu works just fine, but there is a limitation of 100 items per batch, so I had to modify it a bit:

$SaContext = (Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccountName).Context
$table = (Get-AzStorageTable -Name $tableName -Context $SaContext).cloudtable 
$partitionKey = 'TEST'

[Microsoft.Azure.Cosmos.Table.TableBatchOperation] $batchOperation = New-Object -TypeName Microsoft.Azure.Cosmos.Table.TableBatchOperation

$batchCount = 0 # Initialize Batch count

foreach ($rowKey in 1..300){
    $batchCount++
    $entity = New-Object Microsoft.Azure.Cosmos.Table.DynamicTableEntity -ArgumentList $partitionKey, $rowKey
    # Adding a dummy property Key, Value style
    $entity.Properties.add("Key","Value$rowKey") 
    # Add to batch collection
    $batchOperation.InsertOrReplace($entity) 

    # Maximum number of items per batch is 100
    # Execute batch, when collection = 100 items
    if ($batchCount -eq 100) {
        $table.ExecuteBatchAsync($batchOperation)
        # Initialize bach collection variable object and $batchCount
        [Microsoft.Azure.Cosmos.Table.TableBatchOperation] $batchOperation = New-Object -TypeName Microsoft.Azure.Cosmos.Table.TableBatchOperation
        $batchCount = 0 
    }
}

# If the last collection ot items is less than 100, execute here
if ($batchOperation.Count -ne 0) {
    $table.ExecuteBatchAsync($batchOperation)
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.