python pandas read_csv skiprows does not work

Question

I am reading a large file separately since pd.read_csv usually causes error and shut down the kernal in ipython notebook when reading large csv file.

However, the skiprow function does not work in my case have updated the pandas to the newest version to 0.20.1 but the skiprows function still does not work.

In the following part, I would like to skip the first 2 rows and read only 2nd to 6th rows. but failed to skip the first 2 rows by using skiprows in pd.read_csv.

def read(path, header):
    df= pd.read_csv(path, nrows=6, engine='python')
    df1= pd.read_csv(path, skiprows=2, nrows=6, engine='python' )
    df.columns= header    

    print df.shape
    print df1.shape
    return df

and the results turns out to be

(6, 26)
(6, 26)

which shows that the skiprows does not work at all.. have googled but did not see anyone having the same problem as me.. I am wondering if I have missed some important part that cause this problem.

Thanks in advance.

added information:

the first 7 rows of my csv files :

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(SFC) ウイザードリイ・外伝４ (管理：4366),4988606101009,998,1,17297,2511,2161,16899,16900,16903,,,,,shopping,game_and_toy,video_game,retro_game,super_famicom,software,,,,"

"

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(SFC) スーパードラッケン (管理：3701),4906571521028,298,1,17297,2511,2161,16899,16900,16903,,,,,shopping,game_and_toy,video_game,retro_game,super_famicom,software,,,,"

"

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(FC) サンダーバード  (管理：9347),4988110900051,498,1,17302,2511,2161,16899,16904,16908,,,,,shopping,game_and_toy,video_game,retro_game,nes,software,,,,"

"

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(FC) ガンサイト (管理：8853),4988602564624,198,1,17302,2511,2161,16899,16904,16908,,,,,shopping,game_and_toy,video_game,retro_game,nes,software,,,,"

"


  20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(SFC) プリンセスメーカー (管理：4201),4904880133802,298,1,17297,2511,2161,16899,16900,16903,,,,,shopping,game_and_toy,video_game,retro_game,super_famicom,software,,,,"

it is very dirty and a redundant line ", " occurs any one of the two rows..

hi, i would like to read the line from 3 to 6 by skipping the first 2 lines. — Winds
– Winds, Commented May 16, 2017 at 16:31
@EdChum hello I added the information to the article. thanks! — Winds
– Winds, Commented May 16, 2017 at 16:49
The problem was clarified by @Kyle and the question is closed. thanks! — Winds
– Winds, Commented May 16, 2017 at 16:54

Kyle · Accepted Answer · 2017-05-16 16:35:57Z

1

nrows is from the starting offset, not from the begining of the file. You want nrows=4.

answered May 16, 2017 at 16:35

Kyle

2,9342 gold badges21 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Winds Over a year ago

Hello Kyle, In the actual practice, " skiprows=3000001, nrows=6000000". did not skip as well..

Kyle Over a year ago

@LeighTsai the actual number of rows to skip, or rows to read, doesn't matter. You just need to remember that read_csv() will return the number of rows specified in nrows, regardless of what skiprows is.

Winds Over a year ago

@Kyle thanks for clarifying this! now I see how it works right now!

Collectives™ on Stack Overflow

python pandas read_csv skiprows does not work

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related