2

Huge newbie to python and this is probably really easy, but I can't get my head around it at all.

I have a text file with a number of rows following this format

 nothing doing    nothing[0]    doing[0] 
 hello world      hello[0]        world[2]

There are only spaces between the strings, no markers.

I'd like to extract these strings into excel file in the following format - so that each 'set' of strings are in a separate column.

           |        1      |       2        |       3
    ------------------------------------------------------
      1    | nothing doing |   nothing[0]   |  doing[0] 
    ------------------------------------------------------
      2    | hello world   |   hello[0]     |  world[2]

I've been looking at answers on here but they don't quite full fill this question.

6
  • 2
    Is the text file exactly like that? Are there tabs between like nothing doing\tnothing[0]\tdoing[0]? How do you differenciate between the first col with a space and the other two cols? Commented Jan 23, 2014 at 18:50
  • The text file is exactly like this. there are spaces between each set of strings. No markers. Commented Jan 23, 2014 at 19:22
  • 1
    Your desired output file doesn't seem to have any commas (or any other fixed delimiter, like semicolons or tabs), but does seem to have vertical alignments. IOW, it doesn't look much like a csv file. Is that exactly the format you want? If so, you can remove csv from the question, because neither the input nor the output are csv. Commented Jan 23, 2014 at 20:03
  • I just want them separated with commas or if opened in excel in separate columns Commented Jan 23, 2014 at 22:13
  • You can create excel spreadsheets using the python-excel package directly in python, I can post an answer on how I'd you'd like Commented Jan 24, 2014 at 17:13

4 Answers 4

3

Alright, here's how you'd write to an actual Excel file. Note that my method of splitting isn't as complicated as others because this is mostly about writing to Excel. You'll need the python-excel package to do this.

>>> data = []
>>> with open("data.txt") as f:
...     for line in f:
...         data.append([word for word in line.split("  ") if word])
...
>>> print data
[['nothing doing', 'nothing[0]', 'doing[0]\n'], ['hello world', 'hello[0]', 'world[2]']]
>>>
>>> import xlwt
>>> wb = xlwt.Workbook()
>>> sheet = wb.add_sheet("New Sheet")
>>> for row_index in range(len(data)):
...     for col_index in range(len(data[row_index])):
...         sheet.write(row_index, col_index, data[row_index][col_index])
>>>
>>> wb.save("newSheet.xls")
>>>

This produces a workbook with one sheet called "New Sheet" that looks like this

Sample output

Hopefully this helps

Sign up to request clarification or add additional context in comments.

13 Comments

you mention the print data, there are over 600 rows to print!
Then don't print it! :P I included that here so you can get a better idea of what I'm doing. Its not necessary for this to work
removed the print because that was just stupid of me :P but again, this just produces all of the row in one excel cell :/
@user3220585 Did you make any changes aside from removing print?
no changes at all, other than renaming the text file to my own
|
0

You could use numpy to read the txt file and csv to write it as csv file. The csv package among others allows you to choose the delimiter of your preference.

import numpy
import csv

data = numpy.loadtxt('txtfile.txt', dtype=str)

with open('csvfile.csv', 'w') as fobj:
    csvwriter = csv.writer(fobj, delimiter=',')
    for row in data:
        csvwriter.writerow(row)

2 Comments

numpy library. I've never heard of this. I presume it needs to be downloaded?
it depends on the python distribution you use whether it is already installed or you need to install it. Python(x,y) includes numpy as far as I know.
0

Sometimes people who use mostly Excel get confused about the difference between how Excel displays its sheets and the csv representation in a file. Here, even though @martineau gave you exactly what you showed you wanted, I think what you're actually going to want is something more like:

import re, csv

with open("infile.txt") as fp_in, open("outfile.csv", "wb") as fp_out:
    writer = csv.writer(fp_out)
    for line in fp_in:
        row = re.split("\s\s+", line.strip())
        writer.writerow(row)

which will turn

$ cat infile.txt 
nothing doing    nothing[0]    doing[0] 
hello world      hello[0]        world[2]

into

$ cat outfile.csv 
nothing doing,nothing[0],doing[0]
hello world,hello[0],world[2]

2 Comments

as long as there are guaranteed to be more than one space between columns
@BrianSchlenker: if that's not guaranteed, we'd have to come up with another rule to separate column from column, and that would require knowing more about the values themselves.
0

The following assumes that each "column" is separated by two or more space characters in a row and that they will never contain a comma in their content.

import csv
import re

splitting_pattern = re.compile(r" {2,}")  # two or more spaces in a row
input_filepath = 'text_file_strings.txt'
output_filepath = 'output.csv'

with open(input_filepath, 'rt') as inf, open(output_filepath, 'wb') as outf:
    writer = csv.writer(outf, dialect='excel')
    writer.writerow([''] + range(1, 4))  # header row
    for i, line in enumerate(inf, 1):
        line = splitting_pattern.sub(',', line.strip())
        writer.writerow([i] + line.split(','))

Contents ofoutput.csvfile created:

,1,2,3
1,nothing doing,nothing[0],doing[0]
2,hello world,hello[0],world[2]

2 Comments

this is currently exporting each row into one cell and not all the row is being displayed?
With the additional information you've added to your question, my updated answer should correct those problems.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.