How to reorder the columns and its respective data of a csv file using shell script

Hi I am trying to reorder the columns of a csv file using below shell script

#!/bin/bash

 

#Reordering the  columns

 

TEMP_FILE=$(mktemp)

 

mlr --csv reorder -f "\"Trade ID\",\"Trade Date and timestamp\",\"CounterParty full Name\",\"CIS Code (Internal)\",\"Trade Description\",\"USI\",\"UTI\",\"UTI Leg 2\",\"CFG LEI\",\"Counterparty LEI\",\"Is CFG Reporting Party?\",\"UPI\",\"Contract type\",\"Execution venue\",\"Execution Venue LEI\",\"Counterparty purchasing protection\",\"Counterparty selling protection\",\"Reference Entity\",\"Start Date\",\"Maturity Date\",\"Price\",\"Cap Strike\",\"Floor Strike\",\"Notional Amount\",\"Notional Currency\",\"Upfront Payment Amount\",\"Upfront Payment Currency\",\"Payment frequency of the reporting counterparty.\",\"Payment frequency of the non-reporting counterparty\",\"Currency 1\",\"Currency 2\",\"Notional amount 1\",\"Notional amount 2\",\"Exchange rate\",\"Delivery type\",\"Settlement or expiration date\",\"Day count convention\",\"Notional amount (leg 1)\",\"Notional currency (leg 1)\",\"Notional amount (leg 2)\",\"Notional currency (leg 2)\",\"Pay Leg Type (CFG)\",\"Receive Leg Type (CFG)\",\"Direction\",\"Option type\",\"Fixed rate\",\"Fixed rate 2\",\"Fixed rate day count fraction\",\"Floating rate payment frequency\",\"Floating rate reset frequency\",\"Floating rate index name/rate period\",\"Floating rate Index 2\",\"Buyer\",\"Seller\",\"Quantity unit\",\"Quantity frequency\",\"Total quantity\",\"Total Remaining Quantity\",\"Notional\",\"Settlement method\",\"Price unit\",\"Price currency\",\"Buyer pay index\",\"Buyer pay averaging method\",\"Seller pay index\",\"Seller pay averaging method\",\"Grade\",\"Option style\",\"Option premium\",\"Hours from through\",\"Hours from through time zone\",\"Days of week\",\"Load type\",\"Indication of collateralization\",\"MTM\",\"Product Class\",\"Deal Code\",\"Settlement Currency\",\"Valuation Fixing Date\"" Wallstreet_Source_File.csv > $TEMP_FILE

 

mv $TEMP_FILE Wallstreet_Source_File.csv

Am getting error like mlr CSV header/data length mismatch 79 != 28 at filename Wallstreet_Source_File.csv row2

In my csv file i have data for only 28 column all the other 51 are empty is that a reason for this error, if there is any other way to reorder the columns it would be helpful

header/data length mismatch

There has to be a better way of generating that -f argument. I would try setting up an array (maybe via a config file) and expanding it within the mlr command line.

However, you seem to have 79 columns declared for the header, so maybe one (or more) of the data columns has only 28 columns. Check that with some variant of:

awk -F',' 'NF != 79 { printf ("Line %6d has %3d cols\n", NR, NF); }' Wallstreet_Source_File.csv

That is raw (it does not understand fields with embedded comma, quotes or newline), but there might be few enough exceptions to highlight the issue.

Posting a header line with few data lines would be useful for us to test ideas.

If the problem is just consistently short data lines, it would be trivial to add the required number of missing separators.

Look for one-off errors. 79 fields means 78 separators.

This the exmaple data am not able to upload the document

|USI|CFG LEI|Counterparty LEI|UPI|Product Class|Execution venue|MTM|Currency 1|Currency 2|Notional amount 1|Notional amount 2|Exchange rate|Delivery type|Settlement or expiration date|Execution Venue LEI|Indication of collateralization|Trade ID|CounterParty full Name|Start Date|Deal Code|Settlement Currency|Valuation Fixing Date|Trade Date and timestamp|CIS Code (Internal)|Trade Description|UTI|Is CFG Reporting Party?|Contract type|UTI Leg 2 |Counterparty purchasing protection |Counterparty selling protection |Reference Entity |Maturity Date |Price |Cap Strike |Floor Strike |Notional Amount |Notional Currency |Upfront Payment Amount |Upfront Payment Currency |Payment frequency of the reporting counterparty. |Payment frequency of the non-reporting counterparty |Day count convention |Notional amount (leg 1) |Notional currency (leg 1) |Notional amount (leg 2) |Notional currency (leg 2) |Pay Leg Type (CFG) |Receive Leg Type (CFG) |Direction |Option type |Fixed rate |Fixed rate 2 |Fixed rate day count fraction |Floating rate payment frequency |Floating rate reset frequency |Floating rate index name/rate period |Floating rate Index 2 |Buyer |Seller |Quantity unit |Quantity frequency |Total quantity |Total Remaining Quantity |Notional |Settlement method |Price unit |Price currency |Buyer pay index |Buyer pay averaging method |Seller pay index |Seller pay averaging method |Grade |Option style |Option premium |Hours from through |Hours from through time zone |Days of week |Load type
||DRMSV1Q0EKMEXLAU1P80|549300UGRJZFKLBDNB67||FXF|OffFacility|-6573.66|EUR|USD|817785|923933.49|1.1298|Physical|4/13/2028||Uncollateralized|2.02304E+12|AG NET LEASE RLTY FND IV Q LP|4/12/2023|B|||12-Apr-23|AA077C7|FX FORWARD TRANSACTION|DRMSV1Q0EKMEXLAU1P80WSSFX202304120000436044681000001|Y|FX FORWARD TRANSACTION||||||||||||||||||||||||||||||||||||||||||||||||||||
||DRMSV1Q0EKMEXLAU1P80|549300UGRJZFKLBDNB67||FXF|OffFacility|-6573.66|EUR|USD|817785|923933.49|1.1298|Physical|4/13/2028||Uncollateralized|2.02304E+12|AG NET LEASE RLTY FND IV Q LP|4/12/2023|B|||12-Apr-23|AA077C7|FX FORWARD TRANSACTION|DRMSV1Q0EKMEXLAU1P80WSSFX202304120000436044681000001|Y|FX FORWARD TRANSACTION||||||||||||||||||||||||||||||||||||||||||||||||||||
$ awk -F'|' '{ printf ("Line %6d has %3d cols\n", NR, NF); }' foozz.data
Line      1 has  80 cols
Line      2 has  81 cols
Line      3 has  81 cols

The vertical bar | is a field separator: it would not normally occur before the first field (which will actually thereby become the second field). In this case, the first column has no name, which may not be acceptable to mlr.

Having two initial separators on the data lines is an extravagance.

I don't see where the reported 79 or 28 come from, but I suspect you have changed the data between your initial post and the data being shown.

I'm not familiar with mlr, but the --csv option seems to imply comma-seperated data. Or perhaps the -f string (with double-quoted column names separated by commas) implies that comma is the separator. However, nothing mentions | as a column separator, and yet there they are. I don't see how your command can make any attempt to separate fields in this case, so I would expect every line to be treated as a single fieldd.

1 Like

This topic was automatically closed 300 days after the last reply. New replies are no longer allowed.