0

I am trying to return a table location (path) using a SQL query (in my Python code). Is it possible?

I'm using Hive and the data is stored in hdfs.

I found this snippet of code elsewhere:

SELECT SUBSTRING(physical_name, 1,
CHARINDEX(N'master.mdf',
LOWER(physical_name)) - 1) DataFileLocation
FROM master.sys.master_files
WHERE database_id = 1 AND FILE_ID = 1

But unfortunately I don't understand it, so I don't even know how to customise it to my problem.

I want to use this query in my unit tests in the following helper function:

 @classmethod
    def return_table_location(schema, table) -> str:
     """ Returns the location of the data files """

        table_location_query = (***the query***)
        return table_location_query

Could anyone shed some light on this problem?

4
  • You need to know where is a Table into a Database or where is located the Database? Commented Feb 28, 2020 at 14:09
  • I need to know the path where the table is stored in a file system (hdfs specifically) Commented Feb 28, 2020 at 14:10
  • 1
    I believe you're going to have to expand on your question a bit. HDFS doesn't store tables in the file system. It stores data files in the file system, and then a database appliance, like Hive for instance, imposes a table schema (on read) over the data that's stored in the files. How you locate the files is going to be determined by which database appliance you're working with. Commented Feb 28, 2020 at 14:22
  • Thanks Eric, I edited my question to include the information that I'm using Hive Commented Feb 28, 2020 at 14:28

2 Answers 2

1

Try with:

SELECT physical_name
FROM master.sys.master_files
WHERE Name = 'master'

change 'master' with the name of your Database and the result is the physical path of it.

EDIT: Or this

SELECT 
name as DB_Name,
physical_name as FullPathName, 
SUBSTRING(physical_name, 1, CHARINDEX(name + '.mdf', LOWER(physical_name)) - 
1) as PathName
FROM master.sys.master_files
WHERE Name = 'master'

In this you have the DB Name, the full path(including DB name) and the path(excluding the db name). If you need other parameters from the select, use select * from master.sys.master_file to show all the columns you can include.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your answer. This fails for me now. I'm getting: SQL Error [40000] [42000]: Error while compiling statement: FAILED: ParseException line 6:15 cannot recognize input near '.' 'master_files' 'WHERE' in table source
I'm using Hadoop technologies, would that make it different?
I don't know Hadoop but probably can be different. Edit your post and past the snippet where you call this query.
0

The answer lied in understanding that finding a table location is database appliance-specific.

For Hive the answer is:

DESCRIBE FORMATTED schema_name.table_name

The topic can be closed, thanks for leading me to the correct solution.

1 Comment

Actually, only you can close the topic. :) There's a grey check mark to the left of your answer that only you (as the person who asked the question) can see. When you click on it, it will mark this as the accepted answer to your question, which "closes" the question. When answering your own question, though, there's a waiting period. I don't remember how long it is, but you'll get a pop up telling you if not enough time has gone by when you try to click it. Glad you got your question sorted!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.