0

I am an absolute beginner when it comes to working with REST APIs with python. We have received a share-point URL which has multiple folders and multiples files inside those folders in the 'document' section. I have been provided an 'app_id' and a 'secret_token'.

I am trying to access the .csv file and read them as a dataframe and perform operations. The code for operation is ready after I downloaded the .csv and did it locally but I need help in terms of how to connect share-point using python so that I don't have to download such heavy files ever again.

I know there had been multiple queries already on this over stack-overflow but none helped to get to where I want.

I did the following and I am unsure of what to do next:

import json
from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
from office365.runtime.http.request_options import RequestOptions

site_url = "https://<company-name>.sharepoint.com"
ctx = ClientContext(site_url).with_credentials(UserCredential("{app_id}", "{secret_token}"))

Above for site_url, should I use the whole URL or is it fine till ####.com?

This is what I have so far, next I want to read files from respective folders and convert them into a dataframe? The files will always be in .csv format

The example hierarchy of the folders are as follows:

Documents --> Folder A, Folder B

Folder A --> a1.csv, a2.csv

Folder B --> b1.csv, b2.csv

I should be able to move to whichever folder I want and read the files based on my requirement.

Thanks for the help.

1 Answer 1

1

This works for me, using a Sharepoint App Identity with an associated client Id and client Secret.

First, I demonstrate authenticating and reading a specific file, then getting a list of files from a folder and reading the first one.

import pandas as pd
import json
import io

from office365.sharepoint.client_context import ClientCredential
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File

#Authentication (shown for a 'modern teams site', but I think should work for a company.sharepoint.com site:
site="https://<myteams.companyname.com>/sites/<site name>/<sub-site name>"
#Read credentials from a json configuration file:
spo_conf = json.load(open(r"conf\spo.conf", "r")) 

client_credentials = ClientCredential(spo_conf["RMAppID"]["clientId"],spo_conf["RMAppID"]["clientSecret"])
ctx = ClientContext(site).with_credentials(client_credentials)


#Read a specific CSV file into a dataframe:
folder_relative_url = "/sites/<site name>/<sub site>/<Library Name>/<Folder Name>"
filename = "MyFileName.csv"
response = File.open_binary(ctx, "/".join([folder_relative_url, filename]))
df = pd.read_csv(io.BytesIO(response.content))


#Get a list of file objects from a folder and read one into a DataFrame:

def getFolderContents(relativeUrl):
    contents = []
    library = ctx.web.get_list(relativeUrl)
    all_items = library.items.filter("FSObjType eq 0").expand(["File"]).get().execute_query()
    for item in all_items:  # type: ListItem
        cur_file = item.file
        contents.append(cur_file)
    return contents

fldrContents = getFolderContents('/sites/<site name>/<sub site>/<Library Name>')
response2 = File.open_binary(ctx, fldrContents[0].serverRelativeUrl)
df2 = pd.read_csv(io.BytesIO(response2.content))

Some References:

Related SO thread.

Office365 library github site.

Getting a list of contents in a doc library folder.


Additional notes following up on comments:

The site path doesn't not include the full url for the site home page (ending in .aspx) - it just ends with the name for the site (or sub-site, if relevant to your case).

You don't need to use a configuration file to store your authentication credentials for the Sharepoint application identity - you could just replace spo_conf["RMAppID"]["clientId"] with the value for the Sharepoint-generated client Id and do similarly for the client Secret. But this is a simple example of what the text of a JSON file could look like:

{
    "MyAppName":{
        "clientId": "my-client-id",
        "clientSecret": "my-client-secret",
        "title":"name_for_application"
    }
} 
Sign up to request clarification or add additional context in comments.

3 Comments

Hi - May I know how to set up json configuration file in this case. Also, for site_url do I take it all the way until the ".aspx", i.e. the complete URL?
Thanks @BioData41 for he response, I tried now and it is not throwing any error but when I print response it gives me 403 code which is to say forbidden error. Even when I try to read the data frame later it gives the same 403 forbidden.
@trojanhorse, are you sure that the person who created the Application Identity set up the correct permissions for it to be able to read from the sites web collection?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.