I'm developing a small website in Flask that relies on data from a CSV file to output data to a table on the frontend using JQuery.
The user would select an ID from a drop-down on the front-end, then a function would run on the back-end where the ID would be used as a filter on the table to return data. The data returned would usually just be a single column from the dataframe as well.
The usual approach, from my understanding, would be to load the CSV data into a SQLite DB on startup and query using SQL methods in python at runtime.
However, in my case, the table is 15MB in size (214K rows) and will never grow past that point. All the data will be as is for the duration of the Apps lifecycle.
As such, would it be easier and less hassle to just load the dataframe table into memory and just filter on a copy of it when requests come in? Is that scalable or am I just kicking a can down the road?
Example:
app = Flask(__name__)
dir_path = os.path.abspath(os.path.dirname(__file__))
with app.app_context():
print("Writing DB on startup")
query_df = pd.read_csv(dir_path+'/query_file.csv')
@app.route('/getData', methods=["POST"])
def get_data():
id = request.get_json()
print("Getting rows....")
data_list = sorted(set(query_df[query_df['ID'] == id]['Name'].tolist()))
return jsonify({'items': data_list, 'ID': id})
This may be a tad naive on my end but I could not find a straight answer for my particular use-case.
select distinct Name from my_table where ID = 'some_id_value' order by Name. I suspect this would prove to be faster than processing a dataframe.