I followed the same process as before. Loaded the image into ComfyUI and tested a few normal map preprocessors: BAE normal, MiDAS normal, and DSINE normal. Connected everything, set them to 1K, and ran them side by side.
At a glance, the results look reasonable. You get the familiar purple-blue gradients and some sense of form.
But the longer you look, the more it breaks.
All three struggled.
Some areas were soft and blurry, like the system couldn’t decide what the surface was doing. Other regions had strange directional shifts that didn’t match the actual form. In a few places, it felt like the model was inventing structure.
Design normal held up slightly better, but not enough to be usable in a comp.
And that distinction matters.
This isn’t about whether it looks like a normal map.
It’s about whether it behaves like one.
Depth maps hold up because they describe broad spatial relationships. What is closer, what is further. Large, readable structure.
Normals are different.
They describe orientation. Every pixel tells you which way a surface is facing.
On the whale, that means curvature, transitions around the fins, and subtle surface changes. Small, continuous variations that require precise information.
From a single 2D image, that information just isn’t there.
So what you get is not a true normal map.
It’s an approximation that looks correct in isolation, but falls apart when used.
If you plug this into lighting or relighting, the inconsistencies show up immediately. The response doesn’t feel grounded.
With depth, you could still imagine some level of integration.
With normals, the gap between “looks right” and “is usable” is much wider.
This is becoming clearer.
Generative tools are very good at producing appearance.
But utility passes are not about appearance. They are data.
They need consistency, predictability, and enough accuracy to support downstream work.
Right now, normal extraction from a single image doesn’t meet that bar.
If you need reliable normals, you need to go back to your 3D program.
This is one of those cases where the difference between plausible and usable becomes very obvious.
And that line matters.

Discussion