1

This is my (abbreviated) text file (formatting might be lost in the post):

date    close   volume          open    high    low
12:21   82.94   "14,748,136"    83.37   83.4    82.73

When I read the .txt file into Python it becomes a list which I then split. How would I take the list and reorder into column vectors? Any help would be much appreciated.

3
  • i'm bet pandas has some sort of utility to do this, pandas.pydata.org/pandas-docs/stable/index.html Commented Nov 7, 2013 at 1:07
  • You should set the Display Name in your profile rather than including a signoff in your posts: stackoverflow.com/users/edit/2593632 Commented Nov 7, 2013 at 1:11
  • 1
    It looks like every single answer has made a different guess on what part of this you're having problems with, and how your data generalizes. Which means you probably need to read the answers and provide enough code, data, and explanation to eliminate the guesswork. Commented Nov 7, 2013 at 1:36

4 Answers 4

4

if you have a list of rows and you just want to change it to a list of columns you can simply do

transposed_list = zip(*original_list_of_rows)

but its not clear if you have a list of rows

Sign up to request clarification or add additional context in comments.

Comments

0

Presumably, given that you have quotes around at least one of the values, it's possible for spaces to appear within a value. So, you can't just split().

You can parse it as a funky dialect of CSV, where the delimiter is a space, and initial whitespace is skipped:

with open('textfile') as f:
    rows = list(csv.reader(f, delimiter=' ', skipinitialspace=True)

That will automatically handle the quotes for you and everything.

However, in at least some cases, columnar data like this can have values that aren't separated at all, like this:

date    close   volume          open    high    low
12:21   82.94   "14,748,136"    83.37   83.4    82.73
12:22   93213.12"15,222,139"    93201.1493333.3390213.94

If so, then you can only parse it by slicing the lines at the appropriate column positions. If you're lucky, you can use the headers for this; otherwise, you'll need to specify them manually. I'll assume you're unlucky, so:

columns = 0, 7, 15, 31, 39, 47, None
def columnize(line):
    return [line[columns[i]:columns[i+1]].rstrip() for i in range(len(columns)-1)]
with open('textfile') as f:
    rows = map(columnize, f)

6 Comments

is there a good book/resource explaining this aspect of python programming?
Python for Data Analysis by Wes Mckinney shop.oreilly.com/product/0636920023784.do
@ChaseCB: Which aspect? Using files as iterators, list comprehensions, parsing simple formats, …?
If you're asking about the general idea of starting with an iterable (file or otherwise) and modifying it step by step (with comprehensions, map calls, etc.) until you get to the end, I think Generator Tricks does a great job. And he's even got some parsing in there.
all of that....I'm looking for something that covers reading/parsing/manipulating different file formats/data with some data cleaning in mind if that makes sense.
|
0
[[x for i,x in enumerate(text.split()) if (i+j)%colNumber==0 ] 
for j in range(colNumber)]

this requires you already know the column number and the text file is formmated as a table. for example:

text='''a   b   c
1   2   4
1   2   4
1   2   4
'''

colNumber=3
table=[[x for i,x in enumerate(text.split()) if (i+j)%colNumber==0 ] 
for j in range(colNumber)]

print(table)

result:

[['a', '1', '1', '1'], ['c', '4', '4', '4'], ['b', '2', '2', '2']]

3 Comments

@user2593632 Hey, if you think someone has solved your problem, do not forget to vote up his answer or set it as accepted one.
im new so unfortunately I don't have that privilege yet.
@ChaseCB: Yes you do. You can't upvote any and all helpful answers until you get more rep, but anyone who can ask a question can accept a single answer to that question. (And if you don't learn to do so, you'll probably never get the rep you need to do anything else.)
0

You can use pandas dataframes. (Hope this is a tab separated file)

import pandas as pd
import numpy as np
import csv


df = pd.read_csv('text.txt', sep='\t', header=None)
print df

Then you can rename the columns.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.