3

I am new to python so please excuse me for my question. In my line of work I have to work with tabular data represented in text files. The values are separated by either a coma or semi colon. The simplified example of such file might look as following:

City;Car model;Color;Registration number
Moscow;Mercedes;Red;1234
Moscow;Mercedes;Red;2345
Kiev;Toyota;Blue;3423
London;Fiat;Red;4545

My goal is to have a script which can tell me how many Mercedes are in Moscow (in our case there are two) and save a new text file Moscow.txt with following

Moscow;Mercedes;Red;1234
Moscow;Mercedes;Red;2345

I will be very thankful for your help.

0

1 Answer 1

6

I would recommend looking into the pandas library. You can do all sorts of neat manipulations of tabular data. First read it in:

>>> import pandas as pd
>>> df = pd.read_csv("cars.ssv", sep=";")
>>> df
     City Car model Color  Registration number
0  Moscow  Mercedes   Red                 1234
1  Moscow  Mercedes   Red                 2345
2    Kiev    Toyota  Blue                 3423
3  London      Fiat   Red                 4545

Index it in different ways:

>>> moscmerc = df[(df["City"] == "Moscow") & (df["Car model"] == "Mercedes")]
>>> moscmerc
     City Car model Color  Registration number
0  Moscow  Mercedes   Red                 1234
1  Moscow  Mercedes   Red                 2345
>>> len(moscmerc)
2

Write it out:

>>> moscmerc.to_csv("moscmerc.ssv", sep=";", header=None, index=None)
>>> !cat moscmerc.ssv
Moscow;Mercedes;Red;1234
Moscow;Mercedes;Red;2345

You can also work on multiple groups at once:

>>> df.groupby(["City", "Car model"]).size()
City    Car model
Kiev    Toyota       1
London  Fiat         1
Moscow  Mercedes     2
Dtype: int64

Update: @Anthon pointed out that the above only handles the case of a semicolon separator. If a file has a comma throughout, then you can just use , instead of ;, so that's trivial. The more interesting case is if the delimiter is inconsistent within the file, but that's easily handled too:

>>> !cat cars_with_both.txt
City;Car model,Color;Registration number
Moscow,Mercedes;Red;1234
Moscow;Mercedes;Red;2345
Kiev,Toyota;Blue,3423
London;Fiat,Red;4545
>>> df = pd.read_csv("cars_with_both.txt", sep="[;,]")
>>> df
     City Car model Color  Registration number
0  Moscow  Mercedes   Red                 1234
1  Moscow  Mercedes   Red                 2345
2    Kiev    Toyota  Blue                 3423
3  London      Fiat   Red                 4545

Update #2: and now the text is in Russian -- of course it is. :^) Still, if everything is correctly encoded, and your terminal is properly configured, that should work too:

>>> df = pd.read_csv("russian_cars.csv", sep="[;,]")
>>> df
     City Car model    Color  Registration number
0  Москва  Mercedes  красный                 1234
1  Москва  Mercedes  красный                 2345
2    Киев    Toyota    синий                 3423
3  Лондон      Fiat  красный                 4545
Sign up to request clarification or add additional context in comments.

8 Comments

The example had only ';' but the OP stated the separator could be either ';' or ','. I think your example breaks on the first ',' that is used as separator.
@Anthon: I interpreted that as merely saying that it could be ; or ,, not both within the same file, but you could be right. Edited to show how to handle that case.
+1 @DSM. That is cool, I briefly looked at pandas today but did not immediately see that you could do that. Of course now you should think about updating the file extension to .socsv
@Anthon: yeah, you can use regex delimiters. Frankly, that alone means that I wind up using it these days in places where I used to use the csv module.
Thank you! I couldnt even hope for such quick responce!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.