4

Im implementing a Sort. The program reads from a text file with ASCII characters. There are two elements per line separated by spaces. Suppose the input is "a b". This defines a precedence relationship between a and b, saying that "a must occur before b".

So if the file is

a b
d c
b d

the output is

a
b
d
c

I have created two linked lists

  • bigList: to store the unique elements (count to keep track of preceding elements)
  • smallList: to store preceding elements
  • List item

Summary of what my code does

  • reads the file line by line
  • grabs the two elements per line
  • checks whether if they are already present, if not inserts them
  • prints out the result based on the count number

it actually prints out all the elements in the file, like for the above input my output is

a
b
b
d
d
c

I am new to C programming and please let me know what I'm doing wrong.

3
  • You should read about minimum working examples and take what you read to heart. Posting your entire code here and expecting someone to read it, understand it, and then tell you what's wrong is really not what the site is about. Commented Mar 7, 2013 at 14:39
  • Thanks, @user1476263. Is there a range of possible inputs for your problem? It is can be much easier to work with numbers than say letters or strings. So if you had a-z as an input, I'd convert that to 0-25. Commented Mar 7, 2013 at 16:25
  • And is it important that you have linked lists? Or do you just want a sorted output? Commented Mar 7, 2013 at 16:26

1 Answer 1

2

I don't know a whole lot about topological sort, but I think I know enough to leave an intelligent comment. Anyone should feel free to edit this response.

I see several issues with this implementation. Some related to C, others related to the algorithm, so let me go through them one at a time.


Problem definition. Topological sort is indeed defined as a precedence of elements in a directed graph. However, this sentence alone does not define the problem completely. Specifically, topological sort is a precedence of elements of a graph starting with a specified source vertex. As an examples suppose you have the following directed graph:

a -> b
b -> c
c -> a

If you start with vertex a, your topological ordering should be {a, b, c}. If you start with vertex c, your topological order should be {c, a, b}. So the problem definition makes no sense without a source vertex. One choice for such vertex could be some vertex of the graph which has no edged pointing to it, i.e. every incident edge is an outgoing edge.

Another thing to keep in mind is graph connectedness. It's not always possible to get to any vertex from any other vertex. So it's worth keeping in mind when implementing such algorithms.


Good data structures are key to good algorithms. If you want to sort things in a directed graph, your best bet is to create a directed graph data structure, which itself would involve creating a Node data structure and an Edge data structure. I suggest looking up adjacency lists. Once you have such data structure in place, it's a matter of running breadth first search on the graph, and you get your topological precedence as a neat consequence.

When implementing an adjacency list, you still need to store all of your elements in one place. A linked list is generally not the best way to do so because it takes constant time to insert into one (assuming to sorting on data), it takes linear time to search through one. That's way suboptimal. As @David RF suggested, Red-Black trees and AVL trees would be the way to go. However, I wouldn't start with this optimization. As long as you have a sound working algorithm, you can always improve your storage data structure. After all, the interface to linked lists and search trees is the same.


The algorithm can be fast, given that you use the right algorithm. I haven't dealt with topological sorts in practice, so I don't know every intricacy and every edge case. But! If you do it with breadth first search using conventional node-edge data structures (note that edges can be implicitly defined within nodes) your search itself should take linear time using breadth-first search.

I've read through your algorithm, and I have to admit that I'm not quite grasping your concept of a big list and a small list. The ambiguous names don't really help. Perhaps it does the job with a single tiny bug hiding somewhere, but it's not too readable. Maybe someone else can comment on your current implementation.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.