2

I have a list with the following content:

VIP NAME DATE  ARRIVE_TIME FLIGHT_TIME

1  USER1 11-02    20.00    21.00
3  USER2 11-02    20.45    21.45
4  USER2 11-03    20.00    21.30
2  USER1 11-04    17.20    19.10

I want to sort this and similar lists with a shell script. The result should be a new list with lines that do not collide. VIP 1 is most important, if any VIP with a bigger number has ARRIVE_TIME before FLIGHT_TIME for VIP 1 on the same date this line should be removed, so the VIP number should be used to decide which lines to keep if the ARRIVE_TIME, FLIGHT_TIME and DATE collide. Similarly, VIP 2 is more important than VIP 3 and so on.

This is pretty advanced, and I am totally empty for ideas on how to solve this.

8
  • 1
    Is this best done in a shell script? Why not use Perl? Commented Nov 2, 2009 at 16:54
  • This would also be okay, but i have more expirence with bash scriptingh than with Perl Commented Nov 2, 2009 at 16:56
  • 1
    @j smith perl is perfect for this sort of thing. Depending on the size of the file you can just pull the whole file in and sort it. This kind of the is on of the reasons perl exists. I can post an answer in perl if you want. Just say the word. Commented Nov 2, 2009 at 17:01
  • @ Elizabeth Buckwalter, the list will contain max 10 lines so it's not very big. I would be very happy the see the perl version to solve this, if you have the possibility to post it. Commented Nov 2, 2009 at 17:07
  • Collidate? Do you mean 'collide'? Commented Nov 2, 2009 at 17:31

2 Answers 2

2

You can use the unix sort command to do this:

There's an example of how to set primary and secondary keys etc:

Example

The uniq command is what you need to remove dupes.

Sign up to request clarification or add additional context in comments.

2 Comments

This misses the point in that the situation for identifying a duplicate is extremely non-trivial. The question is badly framed (possibly because of unrealistic data and/or criteria), but the 'duplicates' are not simple duplicate lines; there is a different VIP number, for example, but the second is less important than the first and therefore gets bumped.
I don't know how to frame the question any better, but i think that it is somehow clear what i want to do.
1

This might get you started:

  • I'm ignoring the header line. You can get rid of it using head or skip it in the for loop.
  • Sort the flights by date, arrival, departure and vip number - having the vip number as a sort key simplifies the logic later.
  • I'm saving the result in an array, but you could redirect it to a temporary file and read it in a line at a time with a while read line; do ...; done <tempfile loop.
  • I'm using indirection to make things more readable (naming the fields instead of using array indices directly - the exclamation point means indirection here instead of "not")
  • For each line in the result that occurs on the same date as the most recently printed line, compare its arrival time to the previous flight's departure time
  • Echo the lines that are appropriate.
  • save the date and departure time for later comparison.
  • You should adjust the < comparison to be <= if that works better for your data.

Here is the script:

#!/bin/bash
saveIFS="$IFS"
IFS=$'\n'
flights=($(sort -k3,3 -k4,4n -k5,5n -k1,1n flights ))
IFS="$saveIFS"

date=fields[2]
arrive=fields[3]
depart=fields[4]

for line in "${flights[@]}"
do
    fields=($line)
    if [[ ${!date} == $prevdate && ${!arrive} < $prevdep ]]
    then
        echo "deleted: $line"    # or you could do something else here
    else
        echo $line
        prevdep=${!depart}
        prevdate=${!date}
    fi
done

1 Comment

This seems interesting, i need to take a closer look at this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.