Going 3D when the game is basically a 2D game seems like an overkill.
In your place, i'd rather use a 2D engine that handles already tiles and slopes, and just use a few tricks for the parts where a '3D' effect is required.
Just a small example, a bridge :
So for the player, that would look like :

Now you can add some invisible trigger zones that will make this part of your map behave like a bridge you can go under.
imagine the yellow trigger means : change the character state so that it is drawn before
the bridge and collides with red zones.
And the green trigger means : switch back to normal state where the character is drawn after the bridge and collides with yellow zones.
with such a scheme you'll have a bridge :

I made up this example quickly : i don't claim that the above scheme work fine in all situations, and if the 'z' is a key aspect of the game you'll have to dig further, but i just want to emphasize that for localized effects some small tricks can allow you to have 100% of your features with far less work than a complete re-design.
In other words : Having a physically accurate model of your world is just one way to get your game to behave as you want. If some tricks can get you faster to the point with simpler code, just remind yourself that gamers do not see your code, just how it behaves.