Latest in the ongoing dialogue with Timothy Lottes.
“The end game of what you are suggesting ends up being that all vendors directly expose their own low-level API to their unique hardware design. Then let the developers decide how to best use it. I do like that idea, I just don’t believe I could convince the vendors to do that. “
I wrote about this here, but I’m not having much success either 🙂 It’s not the best deal for ISVs but I actually think its in the IHVs long-term interest. Rather than write extensions, they could just design their own API in parallel with the hardware and expose all of their distinguishing features as core API features. If the portable high-level pieces were layered on top of these, it would prevent the feature sets from diverging too much, and also reduce driver cost. It’d take a long time to get to this point but it would be viable. I don’t think future hardware would be any more of a problem for a vendor specific API than it is for DX/GL, as long as the vendors added enough abstraction to hide specific devices. They seem to already have a lot of consistency from chip to chip, to avoid driver churn. The main hurdle seems to be extra cost for the vendors in the short term. They still have to implement all of GL/DX for quite some time, so rolling their own comes with a price.
“GCN is special in that descriptors are loaded into the scalar register file.”
I was referring to all the CPU-side indirections in the current crop of APIs. Texture binding does not need to have as much CPU cost as it currently does.
“Think about the other possible GPU design options which don’t involve loading descriptors into a shader register file.”
It seems to me that the ideal HW design for GL bindless is one where all descriptors are stored in some gigantic on-chip memory. For all other hardware, descriptor locality is still worth thinking about. If the descriptors are in memory, then there’s going to be a latency problem for some workloads, whether the instruction set is aware of it or not, and where there are caches, they can thrash. I think it would be a bad idea for the API to standardize around something that poses this kind of risk, unless it also gives devs enough control to mitigate that risk.
This design requires keeping track of all those descriptor copies and patching all of them on resource streaming. This I’m not wild about.
True, but the table model gives you the option of implementing GL-style bindless manually if it’s a better fit. You can just put everything in a huge table and pass indices in constant buffers. Now your indices can be whatever size you want, and you have more control over descriptor locality if you need it.