data processing with python

Question

I am new to python so please excuse me for my question. In my line of work I have to work with tabular data represented in text files. The values are separated by either a coma or semi colon. The simplified example of such file might look as following:

City;Car model;Color;Registration number
Moscow;Mercedes;Red;1234
Moscow;Mercedes;Red;2345
Kiev;Toyota;Blue;3423
London;Fiat;Red;4545

My goal is to have a script which can tell me how many Mercedes are in Moscow (in our case there are two) and save a new text file Moscow.txt with following

Moscow;Mercedes;Red;1234
Moscow;Mercedes;Red;2345

I will be very thankful for your help.

DSM · Accepted Answer · 2013-03-26 15:22:11Z

6

I would recommend looking into the pandas library. You can do all sorts of neat manipulations of tabular data. First read it in:

>>> import pandas as pd
>>> df = pd.read_csv("cars.ssv", sep=";")
>>> df
     City Car model Color  Registration number
0  Moscow  Mercedes   Red                 1234
1  Moscow  Mercedes   Red                 2345
2    Kiev    Toyota  Blue                 3423
3  London      Fiat   Red                 4545

Index it in different ways:

>>> moscmerc = df[(df["City"] == "Moscow") & (df["Car model"] == "Mercedes")]
>>> moscmerc
     City Car model Color  Registration number
0  Moscow  Mercedes   Red                 1234
1  Moscow  Mercedes   Red                 2345
>>> len(moscmerc)
2

Write it out:

>>> moscmerc.to_csv("moscmerc.ssv", sep=";", header=None, index=None)
>>> !cat moscmerc.ssv
Moscow;Mercedes;Red;1234
Moscow;Mercedes;Red;2345

You can also work on multiple groups at once:

>>> df.groupby(["City", "Car model"]).size()
City    Car model
Kiev    Toyota       1
London  Fiat         1
Moscow  Mercedes     2
Dtype: int64

Update: @Anthon pointed out that the above only handles the case of a semicolon separator. If a file has a comma throughout, then you can just use , instead of ;, so that's trivial. The more interesting case is if the delimiter is inconsistent within the file, but that's easily handled too:

>>> !cat cars_with_both.txt
City;Car model,Color;Registration number
Moscow,Mercedes;Red;1234
Moscow;Mercedes;Red;2345
Kiev,Toyota;Blue,3423
London;Fiat,Red;4545
>>> df = pd.read_csv("cars_with_both.txt", sep="[;,]")
>>> df
     City Car model Color  Registration number
0  Moscow  Mercedes   Red                 1234
1  Moscow  Mercedes   Red                 2345
2    Kiev    Toyota  Blue                 3423
3  London      Fiat   Red                 4545

Update #2: and now the text is in Russian -- of course it is. :^) Still, if everything is correctly encoded, and your terminal is properly configured, that should work too:

>>> df = pd.read_csv("russian_cars.csv", sep="[;,]")
>>> df
     City Car model    Color  Registration number
0  Москва  Mercedes  красный                 1234
1  Москва  Mercedes  красный                 2345
2    Киев    Toyota    синий                 3423
3  Лондон      Fiat  красный                 4545

edited Mar 26, 2013 at 15:22

answered Mar 26, 2013 at 14:38

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Anthon Over a year ago

The example had only ';' but the OP stated the separator could be either ';' or ','. I think your example breaks on the first ',' that is used as separator.

DSM Over a year ago

@Anthon: I interpreted that as merely saying that it could be ; or ,, not both within the same file, but you could be right. Edited to show how to handle that case.

Anthon Over a year ago

+1 @DSM. That is cool, I briefly looked at pandas today but did not immediately see that you could do that. Of course now you should think about updating the file extension to .socsv

DSM Over a year ago

@Anthon: yeah, you can use regex delimiters. Frankly, that alone means that I wind up using it these days in places where I used to use the csv module.

user2211803 Over a year ago

Thank you! I couldnt even hope for such quick responce!

|

Collectives™ on Stack Overflow

data processing with python

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related