1

I was recently interviewed with a local tech company for Java developer role and was asked to write a Java program to check the correctness of indentation in python code (at least find the first indentation error).

It may be easier to deal with def, while and for. However it is tricky to handle things like if...elif...else. There are different situations, for example, just if no else or elif; nested ifs. If it is paired, maybe I can use a stack, but you don't know whether they are paired or not.

I could really use some advice here.

4
  • In python there is pylint that does a lot of checking for indention, naming standards, and other PEP 8 related formatting. You could take a look at its code to find out how it works. Here is the source code for pylint Commented Jun 28, 2017 at 21:47
  • 1
    You should be able to handle everything using a stack of stacks approach. The outermost stack shall contain a stack for each line of indentation, and will check to make sure that each line is indented in at the right threshold, and will hold the ordering of nested conditionals, loops, etc, to ensure that you are not per say ending a 'for' loop before the 'if' statement inside of it has finished. The inner stack can handle the complexity of the individual statements, such as your if..elif..else situation. Append new if statements onto their own stack and add it to the outer stack. Commented Jun 28, 2017 at 21:50
  • When you reach an elif, you can simply just check to make sure the top of the outer stack is an if statement, and then move on without adding it. If you reach an else, you can pop the if statement off the outer stack as it has been concluded--and therefore everything inside of it. Commented Jun 28, 2017 at 21:51
  • Obviously there are going to be some situations where you are going to have to play with this idea, but I think generally this is a good place to start. Commented Jun 28, 2017 at 21:51

1 Answer 1

1

The way I see it, there's two distinct ways that you can solve this question.

  1. Convert the python code to some intermediary format which resembles a Context-Free Grammar (CFG) and then perform the error checking on the CFG.
  2. Case checking.

The former is a little more elegant and revolves around compiler theory, or at the very minimum, Automata theory. While this process is guaranteed to work and establishes some sort of language that people can refer to, it may be extremely tedious for something that is time sensitive. It's an elaborate fix to what may seemingly be, a simple task. The advantage of this technique would be that, if we're looking for a "hacky" solution, such as find the ":" and then checking if the next line is indented or not, this may not work for certain inline commands in python that use ":". An example for this could be print("Enter your name:"), or a subprocess.Popen command. This case will ensure that errors like this are avoided.

The latter on the other hand is extremely difficult to keep track of as a programmer and debugging it would be fairly difficult. I say this in some relativistic measure, there would be several cases to check, top level statements. So let's use some "good programming practices", since the keywords def if elif else class etc can be stored in a common locationIn order to solve this, we'd declare some variable (let's call it i, and read in the file line by line and checking the first (or zeroth character). If the character is not a space, we will read till the next whitespace and check if that word exists in some Trie of words that define indentation blocks (you could use a hash as well, doesn't make a difference). If it is in one of the blocks, you increment i by 4 and move on to the next line. In the subsequent line, you'd read up to the ith character and ensure they're spaces. Essentially, you'd repeat this process until you find something that contradicts what is being looked for. Now, if something does contradict what we're looking for, we may have to read the previous line. Consider

long_sequence = [1,2,3,4,1,2,3,4,1,2,3,4, 5,6,7]

Now, this would technically throw an error, so we have to consider this case and several cases which are similar to this where there is a line which is longer than a single line but the command is not complete in the one line.

So, as previously mentioned solution 2 would be very tedious and a debugging nightmare, 1 is elegant but really hard to construct. It really does depend on the timeline that you have to construct it.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.