OpenGL Is Broken

UPDATE (3/20/2015) If you read this, know that Vulkan has addressed every issue I raise in this article. See my postscript here.

The opinions expressed in this post are my own personal views and are not endorsed, shared, or sanctioned by anybody in particular (especially my employer).

Rich Geldrich has a lot to say about this subject, and I agree with pretty much everything on his list.  The present state of OpenGL is incredibly frustrating, and it has caused me to be much more blunt and rhetorical than I might normally be.  There are those who think that OpenGL and not D3D aught to be the primary API target for PC game development. They believe that OpenGL and D3D are basically the same and that OpenGL gaming would just take off if we developers would be more open-minded.  These people are mistaken.  OpenGL is a bad investment for anyone with ambitious graphical goals.

Despite being available nearly everywhere, on the platform that gives us a choice, OpenGL is rarely chosen.  There are three principle reasons:

Reason #1:
OpenGL is highly fragmented across platforms. “Write-once run anywhere” is a myth. Mobile GL, Linux GL, Windows GL, and Mac GL, are all different from one another, and offer varying levels of feature support. While the current GL spec is at feature parity with DX11 (even slightly ahead), the lowest common denominator implementation is not, and this is the thing that I as a developer care about. An advanced spec is of no value if there are large fractions of the market that do not implement it. At this writing, the lowest common denominator feature set for desktop platforms is a restricted subset of GL4. Mobile (ES3) is even further behind, and is sitting where DirectX was 6 years ago.

Reason #2:
OpenGL driver quality is highly variable, and lags abysmally behind DirectX. This is not hard to understand. DX games are the primary driver for GPU sales, so it is natural that the vendors direct their attention there. It is also, certainly, a solvable problem, but given the dominance of Windows for gaming, the IHVs have little incentive to solve it at present.

These first two reasons are both non-technical, and thus, ultimately irrelevant. They can be solved by throwing more resources at the problem. These issues are merely the result of lack of interest in OpenGL on the part of ISVs, IHVs, and, consequently, gaming customers. This lack of interest is the reason why problems 1 and 2 have not been solved yet.  However, it is not the real problem, merely a symptom.

Reason #3:
The real problem is that OpenGL, as designed, is inferior to its competitors in several very important ways, which I will spend the rest of this post laying out.

My intention here is not to offend or insult, (though I will be terse and sarcastic) nor is it to somehow harm OpenGL. I do not care which API wins, but if OpenGL is to win, it MUST correct its numerous flaws, and in my opinion, this should involve a dramatic redesign at nearly every level. OpenGL has gotten a lot of things right, but its most serious problems are not economic or political, they are technical.

GLSL Is Broken

The GL model, placing the compiler in the driver, is WRONG. It was a worthwhile experiment, one that seemed viable at the time, but history has proven it wrong.

The right model is a more open version of the DirectX model, a standard reference compiler which compiles to a high-level representation, which is then recompiled for a given device.

Driver compilation incurs unnecessary runtime costs. When I ship a product, I have been compiling my shaders dozens of times a day. I know that my shaders are well-formed and correct. All I need the driver to do is translate them into efficient executable code as quickly as possible. There is no value added by having the driver do semantic analysis.  This removes value.  The driver should not parse my shader, it should not validate my shader, it should not search for undeclared identifiers or missing semicolons. I have already done that ad nauseum, and if the driver is going to do it again when my game loads, it is going to increase my load times for no particular reason. Yes, I know that drivers can and do cache shaders, but not all of them do, and even if they do it is still better that they not have to miss the cache thousands of times on first run. The first run of my game is when the user experience is most important, and when the load times are most obvious. Caching, therefore, does not truly help me.

Driver compilation damages the platform by introducing divergence in shader syntax across implementations. It hurts driver quality by sucking up valuable engineering resources on irrelevant tasks. It would be better to have the IHV engineers spending their time improving code generation and compile times than worrying about syntax compliance and error detection.

Someone will mention optimization, and say something like “the driver is in a much better position to optimize, and will do it better.

There exists a third-party tool which takes GLSL, performs well understood compiler transforms, and spits out other GLSL. This tool exists because there also exist GLSL compilers which are not doing their jobs. Yes, some implementors do a good job, and there’s no reason they couldn’t optimize a standardized high-level IR, SSA graph, or AST just as effectively, and at a considerably lower development cost. If we are forced to rely on individual implementors to fully optimize our shaders, then applications have no protection against poor implementations.

Someone will ask: “then why don’t you just optimize your code, graphics engineer?“. Ironically, all of the arguments against an implementation-agnostic compiler apply equally well to implementation-agnostic programmers. My reply is simply: “The compiler is in a better position to optimize, and it will do it better.” Now, if only that were true…

Someone will assert that an IR interferes with the compiler’s optimization abililty by removing information from the program. This may have been true of D3D bytecode, but it need not be.  The compiler and IR can be designed in such a way as to eliminate information loss. A simple serialization of an AST would accomplish this goal, though there are probably better choices (e.g. SPIR, LunarGlass).

There are also certain optimizations which need to happen, and are fairly time consuming, but which drivers DO NOT need to implement. Dead code elimination and constant folding are the same no matter who does them, and are always profitable. If implementors can improve their compilers by having others do tedious work for them, then they should do so. My workstation is much better at this sort of thing than an end user’s phone.

Someone will mention GLSL extensions. Irrelevant.   Extensions are orthogonal to the compilation model.  The IR can be defined in such a way as to make it open and extensible (for instance, by adding new opcodes or data types).  Extensions can easily be exposed by adding the relevant syntax to a standard reference frontend.   If that doesn’t work, implementors can fork the reference compiler and define an extension to specify shaders using a proprietary IR in addition to the standard one.  The paranoid ones can closed-source their fork it if they really want.  Note that nothing in the above would prevent a particular application from embedding a compiler and doing runtime compilation to IR if it so wished. The compiler(s) can, and should, be designed with this use case in mind (another example in which DX has shown us the right way of doing things).

Threading is Broken

The single-threaded nature of current APIs is one of the principle reasons why PC games cannot scale well across multiple cores. We need the ability to freely parallelize our draw submission. We make thousands and thousands of draw calls. We have UI, we have trees and shrubs, we have buildings, we have terrain, we have various particle effects, we have lots of objects with lots of variety. We have multiple passes (cascaded shadows, reflections, Z prepass). We need, yes NEED an API that allows submission to be scheduled across cores. D3D11 attempted to solve this problem with mixed results. OpenGL has not even bothered to try.

Somebody will mention multiple GL contexts. This person does not understand what I am saying. By design, we cannot use multiple contexts to simultaneously submit multiple rendering commands destined for the same render target, and that is what I really want to do. Yes, I need to order them somehow, but that is my problem, not GLs. In many cases, the draw order is largely irrelevant, and there is much more efficiency to be gained by threading over batches (of which there are thousands) rather than passes (of which there are perhaps dozens).

OpenGL is not designed for this kind of rendering architecture, and it needs to be.

OpenGL also makes it extremely difficult to do asynchronous resource creation. Threaded resource creation is straightforward in D3D. The relevant calls on the device interface are free-threaded. In OpenGL, this is only possible through an elaborate multi-context dance.  OpenGL needs a standardized, consistent way to perform asynchronous resource management.

Texture And Sampler State Are Orthogonal

Nearly every DX shader I ever write looks something like this:

sampler sDefault;
Texture2D tColorMap;
Texture2D tNormalMap;
Texture2D tSpecMap;
Texture2D tEnvironmentMap;
// ...
tColorMap.Sample( sDefault, uv );
tNormalMap.Sample( sDefault, uv );
tSpecMap.Sample( sDefault, uv );
tEnvironmentMap.Sample( sDefault, R );

In an entire application I often see less than 16 unique sampler states. It is possible to bind the same small set of sampler states to the pipeline and leave them alone for all eternity. This allows for a significant reduction in state change cost. It also makes it much cheaper to sample the same texture using multiple sampler states in the same pass (the texture can be bound once).

I have been told that some people’s hardware is slightly more efficient this way, but these people do not seem to have any trouble implementing DirectX. If they are that concerned about this, then they can and should change their hardware. Let’s take a look at the GPU ISAs for which we actually have documentation:

The AMD GCN ISA is publicly available here. If we examine the ISA and think about how the API would map onto it, it is easy to see that the OpenGL model requires more loads whenever the number of sampler states is less than the number of textures.  In my experience this is basically all the time.

The relevant Intel docs for Haswell are publicly available here and here.   It is much more difficult to navigate these docs (sorry guys), but eventually you will see that it’s basically a wash. The GL model would seemingly require more URB entries to be prefetched, was probably incurs some sort of cost, but its hard to tell how severe.

Nvidia does not publish their actual ISA (sadly), but given that they support both modes in PTX, it seems that it’s not that big a deal to them either.

UPDATE: Correction.  Turns out the ISA is there, its  just hard to find.  

It has been 7 years since DX10 introduced this good idea, and GLSL still stubbornly requires the sampler state to be coupled to the texture state for no discernable reason. This adds API overhead, by forcing us to re-apply sampler state whenever we change the texture unit assignments. It is likely less efficient for a variety of contemporary GPUs, and it makes it very difficult to port contemporary HLSL to/from GLSL.

Too Many Ways to Do The Same Thing

In GL 4.4, there are two sanctioned ways to set up shaders. One is to use a program object. The other is to use a program pipeline object and attach shader stages piecemeal.

There are at least two sanctioned ways to configure the vertex stream. We can use glVertexAttribPointer and the ARRAY_BUFFER binding, or glVertexAttribFormat and glBindVertexBuffer.

There are two sanctioned ways to set up samplers. One is to use a sampler object. The other is to use the implicit sampler state that comes with every single texture object (and is set using glTexParameterXXX).

There are two sanctioned ways to create a texture. The right way (glTexStorageXXX) and the clunky old-school way (glTexImageXXX for each mip).

This redundancy is bad, because the more ways there are to specify state:

  1. The more confusing it is.
  2. The more room there is for drivers to get them wrong.
  3. The less efficient we are at deciding what the heck the state should be..

Let’s work through #3 in more detail. Consider the case of texture creation.

Say we do:

glGenTextures(&n, 1);
for( mips )
    glTexImage2D(n,mips[i]);
// bind the texture and draw

When we draw with the texture, OpenGL specifies that we do a ‘completeness’ check, to make sure that we get a black texture if we screwed up.

UPDATE: Correction.  Incompleteness is undefined behavior in 4.4 unless robust buffer access is enabled at context creation.  Not sure what the implications are of an implementation supporting both.  

Now suppose we did this:

glGenTextures(&n,1);
glTexStorage2D(n,...);
for( mips )
    glTexSubImage2D(n,mips[i]);
// bind the texture and draw.

We did the right thing, our texture cannot possibly be incomplete, but guess what, OpenGL does not know at draw time whether the name we gave it was allocated with TexStorage2D. As a result, we will always execute the moral equivalent of: if(is_complete), for every bind (it might be if(immutable), but its still a redundant branch).

The path we shouldn’t use impedes the performance of the one we should.

This brings us to our next point….

OpenGL’s Error Handling is Wrong

The OpenGL spec requires that nearly every API call must validate itself and set some state so that ‘glGetError’ will return appropriately. Implementations must do a good deal of tedious validation work in order to ensure conformance. Apart from bribing driver engineers, there is no way to get rid of this overhead. Every single OpenGL call is going to perform one or more conditionals in order to validate its input.

Yes, I know we have branch prediction, and yes, they predict well, but I’m executing hundreds of thousands of them. The branches still burn ICache space and consume execution resources. The BTB is only so large, and I’ve got enough branches in the renderer and driver already without every single API call adding a few of its own just in case I happen to screw up. By the time my game ships, I will not be screwing up, but OpenGL will still be limiting my performance by design.

And then there’s texture completeness. Need I say more about texture completeness? We can design that monstrosity away just by stripping glTexImage2D from the API. A thorough pruning will make the API smaller, more robust, and more efficient. It should be completely refactored to remove as many potential error conditions as possible. Those which remain should result in undefined behavior and should be detectable by an optional validation layer.

There Are Too Many Small Inefficiencies

There are quite a few small inefficiencies in OpenGL which are going to render its single-thread performance inferior to that of up and coming APIs. I’ve touched on some of them here, but I’m running long, so I intend to devote a followup post to this subject.

The short version is this: The API is littered with small inefficiencies and flaws. These flaws are due to a design philosophy which incorrectly emphasizes compatibility, tradition, and ease of use over implementation efficiency. These things might be tolerable if we had the ability to scale across cores, but we do not, and even if we did, we would still struggle to achieve a batch throughput anywhere near what DX12/Mantle will give us.

Somebody will point to this and suggest that we use instancing, or multi-draw + uber-shader, or texture arrays, or some combination thereof. All of these things assume a very specific software architecture, one in which the principle design point is to avoid using the API. Too many shader switches? Use an uber-shader. Too many texture swaps? Use bindless or arrays. Still too slow? Sort everything and use instancing/multi-draw.  This is all that OpenGL can offer at present, and the presenters do a good job laying it out.  My contention, however, is that this is insufficient. It is folly to call the API efficient if the only way to be efficient is to avoid using it. Much of the software state change cost can and should be eliminated by re-designing the API. Mantle has proved this principle. DX12 will soon set it in stone.

They Can Fix It

Compatibility and ease of use are both worthwhile goals if one is writing a high level graphics toolkit, but as I have written elsewhere, that is not what OpenGL really is.  A graphics API is not for doing graphics, it is for abstracting GPUs.  Graphics is done at a higher level.  OpenGL, on many platforms, is the only means of accessing the GPU, and as such, it must get better at providing this essential service.  OpenGL must stop trying to fill the high-level and low-level niches simultaneously, because in its present form it is not very good at either one.

OpenGL has a lot of good qualities. Its program object abstraction is a better model than the seperate shader objects from D3D. Its extension mechanism makes it the platform of choice for prototyping GPU features. It has occasionally exposed useful features which D3D lacks. Despite its advantages, it has played second fiddle to D3D for over a decade. The reason is that Microsoft has consistently and proactively improved on D3D, and is even now in the process of redesigning it from scratch, yet again. If Kronos and the OpenGL platform holders wish to become serious competitors in the high-end gaming space, they must be willing to do likewise.

OpenGL must be augmented by a new industry standard which is, lean, clean, modern, and performance-oriented.  Luckily, we don’t have to look very far.

51 Comments

  1. Ted Kotz

    If you want a way to talk more directly to the hardware, could something like Gallium3D be a better choice. There are currently drivers for it for most of the big graphics chips. In fact most opensource openGL calls are just mapped to Galium3D APIs.

  2. Alex

    OpenGL was been always a primary choise as API in a computer graphics course. Today, use OpenGL at a computer graphics course is become a nightmare. Too much complex and a steep learning curve.

  3. aaron

    Maybe change “They can fix it” to “We can fix it”.

    Part of the point of OpenGL is to integrate this very kind of user feedback into the specification process. Get on the mailinglist and let them know what you need, maybe write an extension mockup.

    Anyway, cheers for taking time out of your day to let /us/ know, but please let them know too.

  4. Paul

    So how realistic is it for Mantle to be ported to linux and android, and adopted by nVidia and Intel ?

    • aaron

      It’s my understanding that Mantle is tied pretty close to the GCN architecture. You would need to generalize it, and it’s questionable whether AMD would let you do that or not. And of course, your only hope for this happening is to implement it yourself or with some other people on one of the open drivers (maybe on AMD’s or intel’s open drivers, or as a state tracker for Gallium3D)

        • There are technical issues with Mantle so that we can’t expect to directly use it on other GPUs.

          The biggest issue is that AMD wasn’t provided a spec to any other IHV officially so that the Khronos Group can’t even speak about it.

          Hance, I don’t think Mantle was created to resolve an issue but to provide marketing advantage for AMD.

          • Joshua Barczak

            But this is not a technical issue, it’s a business one. The only limiting factor is AMD releasing the spec, which they’ve said they plan to do.

            • There are a lot of hardware limitations and OpenGL ecosystem constraint that make Mantle unimplementable. Only Southern Islands and Kepler are fully bindless GPU arch. Intel is semi-bindless (only texture descriptor, not sampler descriptor) and the mobile ecosystem is not bindless at all.

              Even on NVIDIA Mantle is not implementable. There is only one memory is implementable for NVIDIA, buffer and texture memory must be different.

              • Joshua Barczak

                It might not map directly, but I think it could carry over with some adjustments. Things that are just too AMD-specific might turn into extensions in “OpenMantle”

    • Maynard Handley

      Mantle isn’t the only game in town, now that Apple has made Metal (sorta) public.
      The docs are here but I think they’re only visible if you have a developerID (which is free, so you might as well sign up if you’re interested).

      https://developer.apple.com/library/prerelease/ios/documentation/Miscellaneous/Conceptual/MTLProgGuide/Introduction/Introduction.html

      It’s interesting that Metal was released first on iOS rather than OSX. This may mean that, as well as just being a better API, this is Apple’s warning shot across the bow at Khronos: “Get your act together and fscking FIX the problems you’ve been told about for years, or by next year your largest member is going to be pushing a new API”.

      Basically — if Metal stays iOS only, we can assume Apple is still trying to work with Khronos; once it moves to OSX that’s a pretty strong sign that Apple has cut the cord. Apple is generally very aggressive about doing things one, and only one, way. If they go all-in on Metal, that probably means they’ve abandoned GL, except for legacy.

  5. The internet

    So how much was your cheque from Redmond for? Enough to retire on? Did it bounce yet ?

    • Joshua Barczak

      Seriously? Didn’t you read the part where I advocated Mantle, not DX12, as the replacement?

      • DaVince

        So how much did AMD pay you?

        Just kidding. It is good that you’re being so critical about something that deserves so.

  6. Guest

    The way i see it it is not about OpenGL being broken or not is about OpenGL inexperience compared to for example D3D.

    In relatively short amount of time experienced D3D developers have to learn and start using OpenGL because OpenGL was chosen for next-gen gaming consoles and it is already dominant in mobile segment.

    Sure OpenGL is in front of challenging task but because there is that much interest in it now it will improve faster.

    Now it is not the time to reinvent the wheel now is the time to improve what is broken for example AMD binary driver is quite broken for OpenGL/D3D. And now is the time to stop making excuses and start to learn OpenGL.

    In the future for sure OpenGL will evolve and maybe it will be rewritten from scratch but ATM inexperienced users just need to learn and some driver providers need to do their job better.

    OpenGL is mature enough and has been in front of the shame challenge before but failed compared to D3D for example. This time i am quite sure it will take the lead and there will be no rewrite for now. Driver vendors will do their job better and developers will learn how to use and work with OpenGL.

    • brandon

      Neither of the next-gen consoles (or previous gen) use opengl, xbox uses a modified directx, and ps4 uses its own API.

  7. Dr. Garry Richman Puncher

    Thanks for being brave and talking about it in your blog Josh.

    For a future post, many people would appreciate it if someone within amd officially acknowledged that the switchable graphics drivers are still broken in the latest versions of windows, especially the intel/amd combination due to negligence by most laptop manufacturers.

  8. asd

    You are full of shit.
    So DX bumps the version to DX12 with some changes in the API and that’s considered acceptable, but OpenGL does the same (multidraws, bindless textures, etc) and that’s wrong because “It is folly to call the API efficient if the only way to be efficient is to avoid using it”.
    It’s not avoiding to use the API, it’s using it differently, because it has changed. Just the same that will be with DX12.

    You can’t just say that something needs a change, then when that change happens continue using it the old way and say nothing has changed.

    The validation problem is basically solved by using multidraws (validation happens only once). Same with multithreading. You prepare all the parameters writing in system memory from multiple threads (easy) then issue 1 API call (the multidraw).

    • Joshua Barczak

      OpenGL and D3D are not doing the same thing. DX is getting a major redesign at all levels to improve state change throughput and concurrency. OpenGL has not done this. Multi-draw does not allow for changing Shader/VAO states without additional draws, and the state changes and command stream generation (the expensive parts) still occur serially.

  9. Scias

    “OpenGL must be augmented by a new industry standard which is, lean, clean, modern, and performance-oriented. Luckily, we don’t have to look very far.”

    But will Mantle be an open API ? Can it be used on any non-(very modern)-AMD GPUs ? Will it be available on Linux/OSX/… ?
    I’m all for a completely renewed OpenGL or a new API to supersede it, however Mantle doesn’t look to be a serious contender in this regard unless the above points are addressed, especially given AMD’s very poor performance in maintaining quality drivers especially under Linux.

  10. Ky

    “OpenGL must be augmented by a new industry standard which is, lean, clean, modern, and performance-oriented. Luckily, we don’t have to look very far.”

    Last thing I heard is that AMD has no plans to port it to any other platform other than Windows

  11. leslie

    I agree with many of your points but your comment about OpenGL on mobile being 6 years behind D3D is quite unfair. You should compare OpenGL ES to D3D on mobile and not the desktop D3D. FWIW, on WP8 we still only have D3D Feature level 9_3 which is not exactly state-of-the-art itself.

    • Joshua Barczak

      That’s fair, and I should really blame mobile IHVs for this, but things are changing rapidly. The new Tegras can do everything Kepler can, and the rest of the HW will follow. I’m puzzled why ES 3.1 didn’t define a standard extension for 4.4 level stuff.

        • Joshua Barczak

          I was thinking of things like bindless, buffer storage, multi-bind, multi-draw. I guess these aren’t all technically 4.4 features but they’re de-facto standards.

          • Because most mobile GPU doesn’t have HW support for at least bindless and multi-draw on one side and on the other side Multi-Bind and BufferStorage were released at Siggraph 2013 which is not enough time for OpenGL ES 3.1 to peak up.

          • bindless is not even an OpenGL 4.4 feature, just an ARB extensions as Intel doesn’t have HW support for it. MultiDraw is an OpenGL 4.3 features and Intel doesn’t have HW support for it but they emulate it so that it’s really slow.

            There are multiple features in OpenGL 4.4 that are not easily or efficiently implementable on mobile GPUs especially tiled-base one: layered rendering, geometry shader, tessellation.

            It’s a lot of HW work to do and it takes a lot of area space so that it’s really a question whether it’s worth it or not.

            • Joshua Barczak

              Leaving out GS/Tessellation/etc makes sense, but multi-bind is just a software feature. Multi-draw indirect might be a little dicey in the general case, but you can get a lot of the benefit by adding a zero-copy version of glDrawElementsIndirect that takes an array of structs, and require the array to stick around until the frame ends

              • What you are asking here is that OpenGL and OpenGL ES becomes a single specification. I think I am also in favor of this direction even if I am concerned that it would make the group larger and slower than the current ARB-next and OpenGL ES groups.

                Let’s say: it would be nice to reduce fragmentation but the Khronos Group needs to improve its process for that.

                That’s a legitimate request but the current ratification progress just didn’t allow to add all these software stuff in ES 3.1.

  12. J

    You present quite an argument and I find myself agreeing with a lot of points brought up. However, I also find myself disagreeing with your reasoning behind “too many ways to do the same thing”. OpenGL provides multiple ways to arrive at the same state to, in turn, provide more efficient alternatives based on the use in mind, does it not? I find that this goes hand-in-hand with your notion of a graphics API being more of a GPU abstraction.

    • Joshua Barczak

      That depends on which sense of “efficient” you mean. I don’t think there’s much runtime efficiency to be gained by having numerous paths. All of the state application tends to happen at draw time, and the fewer permuations that have to be considered, the faster it will go. If you mean programmer-efficiency, then I think that sort of thing can be added on top. By “abstract the GPU” I mean “make all of them look the same”, not “make them easy to program”

      • J

        It’s less “make them easy to program” (as having more paths actually makes it more confusing), but more “account for different hardware optimizations,” as no two GPUs are the same.

        I think that this brings up the issue of who should be following who: should the GPU engineers tailor their hardware to the graphics API layout? Or, should the graphics API layout adapt to the new hardware laid out by GPUs? If the desired route is the former, then yes: having a single path to achieve a desired state is much better, as hardware optimization efforts can be clearly focused. If the latter is desired, then an adaptive API would be desired as each path could take advantage of different hardware optimizations up to the discretion of the programmer.

        The point is, having more options to take advantage of certain optimizations is favorable as opposed to one way which may or may not be fast on particular hardware, especially in a market where the implementation designs and goals of both GPUs and graphics APIs are varied by nature.

        Perhaps in this aspect of OpenGL’s design, “making them all look the same” is one level of abstraction higher than what was desired.

        • Joshua Barczak

          I think it should probably skew more in the hardware’s favor, but if the API is going to try and abstract over all of the different hardwares, it should try to find one consistent interface that strikes the best balance. Multiple paths might allow you to express everybody’s fast path, but I don’t think it makes economic sense. Every vendor still has to implement all paths, which increases vendor costs, and harms the users (performance, driver robustness, confusion, etc). IMO this is actually a point in favor of having vendor specific low level APIs (abstracting only their product line), with cross-vendor ones thrown on top (see my previous post). If you really want to maximize performance, you use the bottom one with the ideal path, and if you want more portability you use the cross-vendor one, which is hopefully still pretty efficient. The cross vendor one would act as a unifying force and keep the vendors from diverging too much from one another. You can also imagine three tiers (vendor specific (a given driver), low-level vendor-agnostic (gallium?), high-level vendor-agnostic (OpenGL)

  13. I was wondering if you know of a real world use of a Validation Layer that you could point me to? Sounds like an interesting idea, and I’d like to look into using it in my own API’s if possible.

    • Joshua Barczak

      D3D’s debug layer is what I would model it on. They can implement a different device interface which does the validation, and then forwards calls to the real one. If GL had explicit contexts it could do a similar thing

  14. How does OpenGL compares to D3D when you compare that dataflow, such as a database stored and parsed by them ?

    I think that will also be an interesting preview if you can manage to benchmark it.

  15. Wayne Cochran

    The non-orthogonality and the increasingly large boilerplate of OpenGL is troublesome — especially for someone who teaches GL programming, but D3D only works on (and is licensed for) MS platforms correct?

    • aaron

      One exception is the Sony PlayStation 4(interestingly running FreeBSD), which has optional D3D licensing(but it costs extra). Other than that, I’m pretty sure it’s just MS.

  16. Patrick Baggett

    I’m really in favor of having a standardized AST as the shader input format, with source-level compilers as a separate, open-source tool that can be included in the program as a separate library if they absolutely must dynamically generate shader source and then compile it. It might make shader debugging challenging though.

    Simple optimizations like strength reduction, CSE, DCE, and constant propagation shouldn’t be done in the driver. It’s really sad to think how many thousands of man-hours have been wasted to develop the same GLSL parser and implement the same optimizations, over and over again.

    You can always take the WebGL approach to writing a spec — just take the GL spec and start hacking it on a website and see how many hornests’ nests you can kick.

  17. Freddy

    Your entire section on error handling is wrong. glGetError() is great in development, but should be turned off in production code. A simple #ifdef almost null and voids your entire broken error handling section.

    glGetError() is great to figure out why things are not working. However, once they are working you don’t need this anymore. If your production code is catching errors then you didn’t fully debug your code.

    • Joshua Barczak

      You need to read it again. The OpenGL implementation does not know whether or not I’ll call glGetError, and must execute code to make it work whether I’m calling it or not

      • Dennis

        By “execute code” you mean “has to save a state value”. Utterly inconsequential and irrelevant.

        • Joshua Barczak

          I mean execute one or more comparisions and a branch per GL call, whether it errors or not. These things add up. A thought experiment: 500K OpenGL calls per frame (not just draws, all calls). At 60Hz we have 30M calls. Adding on average 5 cycles per call means a 7.5% penalty at 2GHz. Driver people work very hard for gains this size.

      • Ionut Cava

        The sad part is that Khronos could easily work around this using the existing ARB_debug_output / CONTEXT_DEBUG_BIT​ combo. Some debugging functions are completely disabled if the bit is not set but I have no idea why glGetError is not part of that group.

    • Ben Grabham

      That’s not entirely true. They should be turned off, but unfortunately this can’t be done most of the time because of the sheer amount of inconsistencies between different drivers. I’ve encountered OpenGL errors on certain hardware but not on any of my development rigs.

Comments are closed.