6

I have a csv file and i need to split it in to n files such that each split file should not exceed 100 mb. I need to achieve it in windows batch script. I tried the below way but its taking lot of time as my unsplit file is in GBs

@echo off
setlocal enableextensions enabledelayedexpansion
set count=1
set maxbytesize=100000000
set size=1
type NUL > output_1.csv

FOR /F  "tokens=*" %%i in (myfile.csv) do (
FOR /F "usebackq" %%A in ('!filename!_!count!.csv') do (
set size=%%~zA) 
if !size! LSS !maxbytesize! (
echo %%i>>!filename!_!count!.csv) else (
set /a count+=1 
echo %%i>>!filename!_!count!.csv 
))

please let me know if there is a better way to achieve this. I cant go to any other scripting languages as my server is windows

2
  • ya, you're basically having to check if the file is GTR than 100000000 with every iteration, which is probably why this is taking so long. Are you uninterested in the use of Powershell or VB in this case, if so are you willing to download any third party software like 7zip that would allow this to be done easier? Commented Mar 8, 2013 at 15:10
  • The main brake is for command (you can check it by running empty FOR /F "tokens=*" %%i in (myfile.csv) do () loop) so you've nothing to do with it. I'd recommend using more high-level languages. Commented Feb 3, 2016 at 7:08

1 Answer 1

2

This would do the trick assuming your lines are roughly the same size.

Its advantage is that it is only a 2 pass solution, One for counting the lines and the other for printing them.

@rem echo off

@rem usage: batchsplit.bat <file-to-split> <size-limit>
@rem it will generate files named <file-to-split>.part_NNN

setlocal EnableDelayedExpansion

set FILE_TO_SPLIT=%1
set SIZE_LIMIT=%2

for /f %%s in ('dir /b %FILE_TO_SPLIT%') do set SIZE=%%~Zs
for /f %%c in ('type "%FILE_TO_SPLIT%"^|find "" /v /c') do set LINE_COUNT=%%c

set /a AVG_LINE_SIZE=%SIZE%/%LINE_COUNT%
set /a LINES_PER_PART=%SIZE_LIMIT%/%AVG_LINE_SIZE%

set "cmd=findstr /R /N "^^" %FILE_TO_SPLIT%"

for /f "tokens=1,2* delims=:" %%a in ('!cmd!') do @(
    set /a ccc = %%a / %LINES_PER_PART%
    echo %%b >> %FILE_TO_SPLIT%.part_!ccc!
)

save it as batchsplit.bat and run it using:

batchsplit.bat myfile.csv 100000000
Sign up to request clarification or add additional context in comments.

2 Comments

set SIZE=%~Z1 will work as well. Anyway your solution is quite nice (especially the method of determining line count in file) but slow too. The main brake is for command.
Thanks @Fr0sT. Good point about setting the size. I know 'for' is slow, but that was the requirement.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.