1

I was trying to load data from Oracle database using Spark data source API.

Since I need to load data by query, I used the query below which I put together from some examples online:

Map<String, String> options = new HashMap<>();
options.put("driver", MYSQL_DRIVER);
options.put("user", MYSQL_USERNAME);
options.put("password", MYSQL_PWD);
options.put("url", MYSQL_CONNECTION_URL); 
options.put("dbtable", "(select emp_no, emp_id from employees) as employees_data");
options.put("lowerBound", "10001");
options.put("upperBound", "499999");
options.put("numPartitions", "10");

DataFrame jdbcDF = sqlContext.load("jdbc", options);

This gets an exception:

Exception in thread "main" java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended

I doubt that we can't give "as employees_data" for an Oracle query, so what am I doing wrong?

2
  • Your code has references like MYSQL_DRIVER; are you really connecting to an Oracle database, or a MySQL database? Commented Mar 1, 2016 at 18:31
  • sorry, i forgot to change the variable name... the values inside the variable are using Oracle drivers private static final String MYSQL_DRIVER = "oracle.jdbc.OracleDriver"; private static final String MYSQL_USERNAME = "qauser"; private static final String MYSQL_PWD = "qauser"; private static final String MYSQL_CONNECTION_URL ="jdbc:oracle:thin:@//"; Commented Mar 1, 2016 at 18:35

2 Answers 2

5

I doubt that we can't give "as employees_data" for an Oracle query

You may doubt it, but you can't use AS for a table alias in Oracle. You can for column aliases, where it is optional, but it is not allowed for table aliases. You can see that in the syntax diagram.

Assuming Spark doesn't mind the alias itself, you can just remove the AS:

options.put("dbtable", "(select emp_no, emp_id from employees) employees_data");
Sign up to request clarification or add additional context in comments.

11 Comments

Worse, he's trying to alias the entire result set with those brackets! "(select emp_no, emp_id from employees) as employees_data"
@MichaelBroughton - I think that's OK, from the Spark docs; the dbtable entry says it can be whatever can be in a from clause and "instead of a full table you could also use a subquery in parentheses", so this would be OK on that basis. Not sure the alias is actually useful though. I've never used this though (or seen it before...)
If it is passing that string to Oracle, I get that same ORA-00933 error with "(select 1 from dual) as mydual"
@AlexPoole if i don't use the alias "as employees_data", spark gives me Exception in thread "main" java.sql.SQLSyntaxErrorException: ORA-00903: invalid table name, I have tested this query using JdbcRDD works fine , since JDBCRDD is discouraged against the datasource API
@prakash - what happens when you just remove the as?
|
0

Try this...

Map < String, String > oracle_options = new HashMap<>()
oracle_options.put("driver", "oracle.jdbc.OracleDriver");
oracle_options.put("url", "jdbc:oracle:thin:username/password@//hostName/instanceName);
oracle_options.put("dbtable", "tableName");
DataFrame dataFrame = hContext.read().format("jdbc").options(oracle_options).load().select(String col1,String col2.....));

Where hContext is HiveContex instance. if you are using selection means where condition use as follows:

DataFrame dataFrame = hContext.read().format("jdbc").options(oracle_options).load().select(String col1,String col2.....)).where(String expr);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.