1

I have a CSV file with the following data:

Date,Profit/Losses
Jan-10,867884
Feb-10,984655
Mar-10,322013
Apr-10,-69417
May-10,310503
Jun-10,522857
Jul-10,1033096
Aug-10,604885
Sep-10,-216386
Oct-10,477532
Nov-10,893810
Dec-10,-80353

I have imported the file in python like so:

with open(csvpath, 'r', errors='ignore') as fileHandle:
lines = fileHandle.read()

I need to loop through these lines such that I extract just the months i.e. "Jan", "Feb", etc. and put it in a different list. I also have to somehow skip the first line i.e. Date, Profit/Losses which is the header.

Here's the code I wrote I so far:

months = []
for line in lines:
    months.append(line.split("-")

When I try to print the months list though, it splits every single character in the file!! Where am I going wrong here??

3
  • 1
    The csv module is your friend. Alternatively pandas would be a huge help. Commented Feb 6, 2019 at 3:56
  • 1
    When you read the whole file with read, you do not have lines anymore. Your lines is one string and for line in lines goes over individual letters. See a proposed solution below. Commented Feb 6, 2019 at 4:11
  • @DYZ, that makes a whole lotta sense! Thanks again! Commented Feb 6, 2019 at 4:12

4 Answers 4

2

You can almost always minimize the pain by using specialized tools, such as the csv module and list comprehension:

import csv
with open("yourfile.csv") as infile:
    reader = csv.reader(infile) # Create a new reader
    next(reader) # Skip the first row
    months = [row[0].split("-")[0] for row in reader]
Sign up to request clarification or add additional context in comments.

1 Comment

Alternatively, you could also use a DictReader(), e.g. reader = csv.DictReader(infile); months = [row['Date'].split('-')[0] for row in reader], not much different other than it handles the column header for you.
1

One answer to your question is to use fileHandle.readlines().

lines = fileHandle.readlines()
# print(lines)
# ['Date,Profit/Losses\n', 'Jan-10,867884\n', 'Feb-10,984655\n', 'Mar-10,322013\n',
#  'Apr-10,-69417\n', 'May-10,310503\n', 'Jun-10,522857\n', 'Jul-10,1033096\n', 'Aug-10,604885\n',
#  'Sep-10,-216386\n', 'Oct-10,477532\n', 'Nov-10,893810\n', 'Dec-10,-80353\n']

for line in lines[1:]:
    # Starting from 2nd item in the list since you just want months
    months.append(line.split("-")[0])

Comments

0

Try this if you really want to do it the hard way:

months = []
for line in lines[1:]:
    months.append(line.split("-")[0])

lines[1:] will skip the first row and line.split("-")[0] will only pull out the month and append to your list months.

However, as suggested by AChampion, you should really look into the csv or pandas packages.

2 Comments

Tried this, it doesn't work! It endlessly prints each letter in the whole file and doesn't stop. I had to Ctrl + C to interrupt the terminal.
Oh, try lines = fileHandle.readlines(); that will make sure that each row of your table is read in separately
0

This should deliver desired results (assuming that file named data.csv in same directory):

result = []

with open('data.csv', 'r', encoding='UTF-8') as data:
    next(data)
    for record in data:
        result.append(record.split('-')[0])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.