In general, yes, you should indeed have a decent understanding of the hardware, the OS, how DLLs work, and other low-level details. As a C++ programmer, there will even be (rare) occasions where you need to inspect a stack, follow base pointers, and reconstruct a proper stack trace when the stack is so screwed that the debugger can't do it for you.
That said, in specific to what you've asked, you are definitely massively overdoing it in one sense. You don't need to worry about specific assembly output of code. In fact, I'll guarantee that your understanding of ideal assembly output is wrong, as modern CPUs are ridiculously complex and obtaining ideal efficiency has less to do with the specific assembly output of a line of code and more to do with whole-program transformations. Turn on the right level of optimization and you won't even be able to find or identify individual lines of code in the generated assembly in many cases. Humans are bad at anything besides micro-optimizations (short sequences of assembly) and macro-optimizations (algorithmic choices): everything in between is usually better done by the compiler than you, and even many micro-optimizations are better handled by the compiler.
It is useful to know what the compiler does. There are ways of writing code that lets the compiler do its job very well, and ways to write code that totally defeat the compiler's algorithms.
It is useful to know how the CPU (or GPU) work internally. There are many ways to write assembly that looks fast but which performs like crap. If you don't understand cache hierarchies, the way DRAM works, CPU instruction pipelines and data dependencies, GPU core block behavior, and GPU texture memory access patterns, then you are utterly incapable of making intelligent decisions about optimizing modern real-world programs. That said, know these in terms of whole algorithms, NOT individual lines of code.
Knowing the exact assembly output of C++ code is just not valuable. Knowing how to optimize code comes from knowing what lets the compiler work, knowing what general patterns to avoid or prefer for target hardware, and how to identify and fix measured performance problems.
And that's the last bit of optimization wisdom I have for you. PROFILE YOUR CODE. If you are making optimization decisions based on assembly output rather than actual metrics and performance data, you are not only wasting tons of your time, you are actively defeating yourself. To be blunt, optimization without profiled data verifying the optimization is pure incompetence. The majority of the time you do that, in my experience, you will be introducing performance pessimizations rather than optimizations. You might think you know what an individual line of code does all the way down to the hardware, but if you can't think about how the entire algorithm works, you lose. There are many cases where you can micro-optimize one small piece to the detriment of the whole. And you as a human being are not capable of understanding the whole down to the hardware level you are talking about, especially when you consider all the smarts in the hardware that "undermines" the assembles (out of order CPUs, DRAM row banks, cache layers, and a bazillion similar things). You MUST rely on profile data and larger metrics, not per-line performance tuning.
Again, I have very directly seen cases where optimizing one line really did make the whole algorithm slower. It was non-intuitive, had nothing to so with how the one lone executed, and everything to do with how that line defeated the ability for the compiler and CPU smarts to handle the larger body of code.
In reference to your problems with getting things done, that's a sign that you are failing as a game developer due to your preoccupation with performance. The performance of your game is utterly irrelevant compared to your performance as a game developer. Optimize for speed of design and content iteration first, and speed of engine execution second. The fastest game engine in the world that can't be used to make a real game in reasonable time periods is utterly worthless. Look for instance at how unpopular Carmack's engines have become (they haven't been licensed by more than a handful of companies since Quake 3, since they're barely useful for anything besides the one game they're made for) or how every major AAA developer used to have to spend so much time rewriting large parts of Unreal in order to make it work for non-FPS games (it's a lot better these days, but it was pretty bad back when Unreal 3 was new).