0

I am reading a large file separately since pd.read_csv usually causes error and shut down the kernal in ipython notebook when reading large csv file.

However, the skiprow function does not work in my case have updated the pandas to the newest version to 0.20.1 but the skiprows function still does not work.

In the following part, I would like to skip the first 2 rows and read only 2nd to 6th rows. but failed to skip the first 2 rows by using skiprows in pd.read_csv.

def read(path, header):
    df= pd.read_csv(path, nrows=6, engine='python')
    df1= pd.read_csv(path, skiprows=2, nrows=6, engine='python' )
    df.columns= header    

    print df.shape
    print df1.shape
    return df

and the results turns out to be

(6, 26)
(6, 26)

which shows that the skiprows does not work at all.. have googled but did not see anyone having the same problem as me.. I am wondering if I have missed some important part that cause this problem.

Thanks in advance.


added information:

the first 7 rows of my csv files :

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(SFC) ウイザードリイ・外伝4 (管理:4366),4988606101009,998,1,17297,2511,2161,16899,16900,16903,,,,,shopping,game_and_toy,video_game,retro_game,super_famicom,software,,,,"

"

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(SFC) スーパードラッケン (管理:3701),4906571521028,298,1,17297,2511,2161,16899,16900,16903,,,,,shopping,game_and_toy,video_game,retro_game,super_famicom,software,,,,"

"

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(FC) サンダーバード  (管理:9347),4988110900051,498,1,17302,2511,2161,16899,16904,16908,,,,,shopping,game_and_toy,video_game,retro_game,nes,software,,,,"

"

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(FC) ガンサイト (管理:8853),4988602564624,198,1,17302,2511,2161,16899,16904,16908,,,,,shopping,game_and_toy,video_game,retro_game,nes,software,,,,"

"


  20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(SFC) プリンセスメーカー (管理:4201),4904880133802,298,1,17297,2511,2161,16899,16900,16903,,,,,shopping,game_and_toy,video_game,retro_game,super_famicom,software,,,,"

it is very dirty and a redundant line ", " occurs any one of the two rows..

5
  • 2
    What does your file look like? Commented May 16, 2017 at 16:30
  • hi, i would like to read the line from 3 to 6 by skipping the first 2 lines. Commented May 16, 2017 at 16:31
  • 1
    Please add a sample of your CSV file Commented May 16, 2017 at 16:31
  • @EdChum hello I added the information to the article. thanks! Commented May 16, 2017 at 16:49
  • The problem was clarified by @Kyle and the question is closed. thanks! Commented May 16, 2017 at 16:54

1 Answer 1

1

nrows is from the starting offset, not from the begining of the file. You want nrows=4.

Sign up to request clarification or add additional context in comments.

3 Comments

Hello Kyle, In the actual practice, " skiprows=3000001, nrows=6000000". did not skip as well..
@LeighTsai the actual number of rows to skip, or rows to read, doesn't matter. You just need to remember that read_csv() will return the number of rows specified in nrows, regardless of what skiprows is.
@Kyle thanks for clarifying this! now I see how it works right now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.