1

The case is like this :

I have a csv file with 6 columns without header like below :

5002200,25081,0000002797,6,,2014/06/05
5001111,25081,0000002790,,,2014/06/05
5004901,00081,0000002799,5,,2014/06/05 
5004901,00081,0000002796,5,,2014/06/05


The output I want is after sorted and displayed like below:

5001111,25081,0000002790,,,2014/06/05
5002200,25081,0000002797,6,,2014/06/05  
5004901,00081,0000002796,5,,2014/06/05 
5004901,00081,0000002799,5,,2014/06/05 


@echo off
if not exist %1 goto :EOF
setlocal
for /F "tokens=1-6 delims=," %%a in (%1) do set "a[%%b,%%c,%%a,%%d,%%e,%%f]=[]"
break > %1
for /F "tokens=2-7 delims=[,]=" %%a in ('set a[') do echo %%c,%%a,%%b,%%d,%%e,%%f>> %1
endlocal

The problem is the null value would missing. Any idea?

My algorithm is sort the 1st columns and 3nd columns then display as original position. But if there are any empty value(like 4th or 5th columns), it would missed.

First column always contains 7 length.
Only 4th or 5th column would contains empty.

2
  • why do you split into tokens, if you don't use the single tokens? Use the wholel line instead. Commented Apr 11, 2016 at 7:50
  • I am new in cmd script. Would you please show me? thanks Commented Apr 11, 2016 at 8:04

3 Answers 3

2
sort /+8 infilename >outfilename

would appear to do what you want. Perhaps if you were to explain clearly what your sorting algorithm is, we'd be able to construct a more suitable system.


@ECHO Off
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q36542742.txt"
SET "outfile=%destdir%\outfile.txt"
SET "tempfile=%destdir%\tempfile.txt"
DEL "%tempfile%" >NUL 2>NUL  
(
:: first step - number each line, number to %%a, line to %%b
FOR /f "skip=1tokens=1*delims=[]" %%a IN ('find /n /v "" "%filename1%"') DO (
 REM tokenise line - required parts to  %%p, %%q
 FOR /f "tokens=1,3delims=," %%p IN ("%%b") DO (
  REM construct sort-record
  CALL :process %%p%%q %%a "%%b"
 )
)
FOR /f "tokens=1*delims= " %%a IN ('sort "%tempfile%"') DO ECHO(%%b
)>"%outfile%"

DEL "%tempfile%" >NUL 2>NUL  

GOTO :EOF

:: First parameter: primary sort-criterion (fixed-length)
:: Second : secondary sort-criterion (leadin-zero-suppressed numeric)
:: Third : quoted data
:process
SET /a $line=1000000000+%2
>>"%tempfile%" ECHO(%1%$line% %~3
GOTO :EOF

You would need to change the settings of sourcedir and destdir to suit your circumstances.

I used a file named q36542742.txt containing your data for my testing.

Produces the file defined as %outfile%

tempfile can be set to whatever takes your fancy.

First, send the file trough find looking for lines that don't contain nothing and number them. Each line will thus become

[number]originallinedata

and by tokenising on [] using the fact that each line begins with a numeric, %%a will be set as the line-number and %%b as the line-data.

Reprocess the line-data, using , to tokenise and pick tokens 1 and 3. Both fields are of fixed-length and the second token may not be empty.

Process the line through the procedure :process providing the parameters concatenated_column1_column3 line_number originaldataline

Within :process, add 1000000000 to the line-number in %2, then send

concatenated_column1_column3_modified_line_numberSpaceoriginaldataline

So the line sent would be

500220000000027971000000001 5002200,25081,0000002797,6,,2014/06/05

The line-portion before the space is fixed-length.

When done, sort the tempfile and report the part after the first space.

Sign up to request clarification or add additional context in comments.

1 Comment

Also, first columns is not always contains 8 length.
0

Just 1 line with sort of Unxutil commands if input file and output file are different,

gawk -F"," "{print $1,$2,$3,$4,$5,$6}" input.csv|sort -gk1,3|sed "s/ /,/g";"s/$/\r/">output.csv

If the output is to input file directly, for example, input .csv file can get the result by dragging itself to the batch-file,

sed -i "s/,/ /g" "%~1"
sort -gk1,3 "%~1" -o"%~1"
sed -i "s/ /,/g";"s/$/\r/" "%~1"
exit /b

Each columns can be kept as original.

Comments

0

The following script is capable of what you are requesting (let us call it sort-csv.bat):

@echo off
setlocal EnableExtensions EnableDelayedExpansion

rem Define constants:
set "INFILE=%~1"
set "OUTFILE=%~2"
set "TEMPFILE=%TEMP%\%~n1_interim_to_sort%~x1"
set /A MAXWIDTH=10

if not exist "!INFILE!" exit /B 1
if not defined OUTFILE set "OUTFILE=%~dpn1_sorted%~x1"
set "PADZEROS="
for /L %%$ in (1,1,%MAXWIDTH%) do set "PADZEROS=!PADZEROS!0"
> "!TEMPFILE!" (
    for /F "delims=" %%# in ('findstr /N /R "^^" "!INFILE!"') do (
        set "LINE=%%#" & set "LINE=!LINE:*:=!"
        for /F "delims=:" %%a in ("%%#") do set "LNUM=!PADZEROS!%%a"
        for /F "tokens=1,3 delims=," %%A in (""!LINE:^,^=","!"") do (
            set "ITEM1=!PADZEROS!%%~A" & set "ITEM1=!ITEM1:~-%MAXWIDTH%!"
            set "ITEM2=!PADZEROS!%%~B" & set "ITEM2=!ITEM2:~-%MAXWIDTH%!"
            echo(!ITEM1!;!ITEM2!;!LNUM:~-%MAXWIDTH%!_!LINE!
        )
    )
)
> "!OUTFILE!" (
    for /F "tokens=1,* delims=_" %%I in ('sort "!TEMPFILE!"') do (
        echo(%%J
    )
)
> nul 2>&1 del "!TEMPFILE!"

endlocal
exit /B

To use this batch file, provide the input and output paths/files as command line arguments:

sort-csv.bat "input-file.csv" "output-file.csv"

The main idea behind this is to replace every single separator , by "," and enclosing every line within "", so each item becomes enclosed within ""; for instance, a line like 1,2,,4 becomes "1","2","","4". This avoids adjacent separators ,, and therefore, a for /F loop with , as the delimiter can be used to get the items; the ~ modifier of the for /F variable is used to remove the surrounding "".

For sorting, a temporary file is used, which contains the original lines prefixed with the (semicolon-separated) columns to be used for sorting and the original line number in a leading-zero-padded manner. So your input file becomes this:

0005002200;0000002797;0000000001_5002200,25081,0000002797,6,,2014/06/05
0005001111;0000002790;0000000002_5001111,25081,0000002790,,,2014/06/05
0005004901;0000002799;0000000003_5004901,00081,0000002799,5,,2014/06/05
0005004901;0000002796;0000000004_5004901,00081,0000002796,5,,2014/06/05

This file is then fed into the sort command, whose output is captured by another for /F loop, which cuts off the prefix, that is everything up to the _ character.

9 Comments

What do you mean? does the script produce no output file? it works for me, using your sample data; note that I fixed the script file name (it is called sort-csv.bat instead of sort_csv.bat)...
I fix it. but could you please edit some code to let input is one file, output is sorted in those file.
To overwrite the original file, just give the same file as input and output file; for instance: sort-csv.bat "datafile.csv" "datafile.csv"; if you want the script to accept a single argument only, simply replace the line set "OUTFILE=%~2" (line #6) by set "OUTFILE=%~1", or even better, replace line #11 if not defined OUTFILE set "OUTFILE=%~dpn1_sorted%~x1" by if not defined OUTFILE set "OUTFILE=%~f1" (so you could still provide a second argument as the outpu file optionally)...
After some testing, your script has a bug when process some rows have same content in 1st columns and different in 3nd columns. There are another row being missed.
@Kason, please explain what happens; with the sample data you provided, the script works fine; so could you please give sample data which cause the script to fail?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.