0

I Googled for a solution to create a table, using Databticks and Azure SQL Server, and load data into this same table. I found some sample code online, which seems pretty straightforward, but apparently there is an issue somewhere. Here is my code.

CREATE TABLE MyTable
USING org.apache.spark.sql.jdbc 
OPTIONS (
  url "jdbc:sqlserver://server_name_here.database.windows.net:1433;database = db_name_here",
  user "u_name",
  password "p_wd",
  dbtable "MyTable"
);

Now, here is my error.

Error in SQL statement: SQLServerException: Invalid object name 'MyTable'.

My password, unfortunately, has spaces in it. That could be the problem, perhaps, but I don't think so.

Basically, I would like to get this to recursively loop through files in a folder and sub-folders, and load data from files with a string pattern, like 'ABC*', and load recursively all these files into a table. The blocker, here, is that I need the file name loaded into a field as well. So, I want to load data from MANY files, into 4 fields of actual data, and 1 field that captures the file name. The only way I can distinguish the different data sets is with the file name. Is this possible? Or, is this an exercise in futility?

2 Answers 2

1

my suggestion is to use the Azure SQL Spark library, as also mentioned in documentation:

https://docs.databricks.com/spark/latest/data-sources/sql-databases-azure.html#connect-to-spark-using-this-library

The 'Bulk Copy' is what you want to use to have good performances. Just load your file into a DataFrame and bulk copy it to Azure SQL

https://docs.databricks.com/data/data-sources/sql-databases-azure.html#bulk-copy-to-azure-sql-database-or-sql-server

To read files from subfolders, answer is here:

How to import multiple csv files in a single load?

Sign up to request clarification or add additional context in comments.

1 Comment

I started with Spark and switched to SQL, only because I couldn't get Spark to do what I needed. I'll take a second look at that option. Thanks!!
0

I finally, finally, finally got this working.

val myDFCsv = spark.read.format("csv")
   .option("sep","|")
   .option("inferSchema","true")
   .option("header","false")
   .load("mnt/rawdata/2019/01/01/client/ABC*.gz")

myDFCsv.show()
myDFCsv.count()

Thanks for a point in the right direction mauridb!!

1 Comment

My pleasure :) :) :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.