0

I am currently trying to develop a program that reads data from a text file, and returns the pair of employees that has worked the most time together. I decided to it in a .CSV format, as that is still a plain text format, but seperated with comas.

Example:

EmpID,ProjectID,DateFrom,DateTo
1,A,2014-11-01,2015-05-01
2,B,2013-12-06,2014-10-06
2,C,2014-01-07,2016-03-07
3,B,2015-06-04,2017-09-04
5,C,2014-10-01,2015-12-01
1,A,2013-03-07,2015-11-07
2,C,2015-07-09,2019-01-19
3,B,2013-11-13,2014-03-13
4,C,2016-02-14,NULL
5,D,2014-03-15,2015-11-09

Now, I learned how to read .CSV files, but I am not sure on what is the best way for the thing after (the comparing of values, etc). For now, I decided that this is the cleanest option:

import csv

with open('data.csv', 'r') as f:
  reader = csv.reader(f)
  your_list = list(reader)

print(your_list)

I just want a piece of advice, if the best way to go would be with comparing indexes of the list. I was also thinking about dictionaries, but I am not sure, hence the reason I am asking here :) And SQL is not an option, even though it would be so easy with it. Sorry if this is a bad question, but I am currently learning Python and this is quite an important task for me. Thanks!

3
  • I'd personally create an Employee class and use each line to create an Employee object. This reduces complexity because it's much easier to refer to, say, an employee's id as employee.id instead of an index of a list or a key in a dictionary, and Employee's methods could deal with the logic afterwards. This of course requires you to learn at least some object oriented programming basics in Python, however. Commented May 29, 2019 at 15:32
  • 3
    I highly recommend using pandas. It makes manipulating the data table a lot easier. Pandas will render the csv file as a dataframe and you can specify which column is the index. Commented May 29, 2019 at 15:33
  • 2
    I second the pandas vote, the library is set up specifically to handle this kind of problem. I've linked the read_csv method Commented May 29, 2019 at 15:34

2 Answers 2

1

As I understand from what you wrote, I think what you need is something like this:

#read csv, and split on "," the line
csv_file = csv.reader(open('data.csv', "rb"), delimiter=",")

for item in csv_file:
#do your work

maybe you can look at Pandas too if you have large Data. It ll be more efficient to work with Pandas in that Case

Sign up to request clarification or add additional context in comments.

Comments

1

You can use datetime package to check total time elapsed. Create a list of people in the csv file, then sort the list based on the elapsed time. for the first 8 rows of csv file (because NULL is undefined!):

1,A,2014-11-01,2015-05-01
2,B,2013-12-06,2014-10-06
2,C,2014-01-07,2016-03-07
3,B,2015-06-04,2017-09-04
5,C,2014-10-01,2015-12-01
1,A,2013-03-07,2015-11-07

You can use this:

from datetime import datetime
with open('file.txt', 'r') as file:
    my_list = list()
    for line in file:      
        list_ = line.split(',')
        dt1 = datetime.strptime(list_[2], '%Y-%M-%d')
        dt2 = datetime.strptime(list_[3][:10], '%Y-%M-%d')
        my_list.append(list_[:2] + [dt2-dt1])
        my_list.sort(key=lambda x: x[2])
print(my_list)

output:

[['3', 'B', datetime.timedelta(days=364, seconds=85920)], ['1', 'A', datetime.timedelta(days=364, seconds=86040)], ['2', 'B', datetime.timedelta(days=364, seconds=86280)], ['5', 'C', datetime.timedelta(days=365, seconds=120)], ['2', 'C', datetime.timedelta(days=730, seconds=120)], ['1', 'A', datetime.timedelta(days=730, seconds=480)], ['3', 'B', datetime.timedelta(days=731, seconds=180)], ['2', 'C', datetime.timedelta(days=1470, seconds=86040)]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.