Author Topic: Post-Processing (Read 86709 times)

Michael Lobko-Lobanovsky · « **on:** January 24, 2018, 06:35:17 pm »

My attempt was to render the scene to texture in either FFP or PPL for subsequent post-processing with various effects shaders, and then finally render the post-processed screen-sized textured quad onto the screen for viewing.

The technique sort of works... However,

~~the resultant texture seems somewhat imprecise as if it were transparently resized by OpenGL, which obviously causes extra jagginess and loss of fine detail;~~ [UPD] Resolved. Currently the details are actually even sharper than they should be.
the cost of glCopyTexImage2D() in each frame appears too high for practical purposes with high poly models (poly count > 1M).

Below is an example of vignetting an FFP render with a post-processing shader.

( $:-\$ )

Michael Lobko-Lobanovsky · « **Reply #1 on:** January 25, 2018, 01:52:59 am »

Hehehehe...

In fact, the vignette shader allows for very simple and easy scene over-brightening. It's effectively the missing upper half of ObjReader's "stray light" control.

Michael Lobko-Lobanovsky · « **Reply #2 on:** January 25, 2018, 07:07:05 am »

Not bad for starters, n'est-ce pas mon ami?

The 2D effect is the same as in my earlier "fake light shafts" demo featuring a picture of Regent Street in London. The screenshot was taken while in the FFP mode.

Michael Lobko-Lobanovsky · « **Reply #3 on:** January 29, 2018, 03:11:41 am »

Patrice,

I think we have reached the limits of what we can do in the context of ObjReader's current camera, vertex buffer array storage and render target utilization:

Keeping vertex/normal/texture data in multi-million poly RAM-resident arrays we're binding to the OpenGL default render target (color buffer) in each frame stresses the data bus, and especially so in the animation (continuous-render) mode(s) of operation.
The only technique we can utilize currently for post-processing is the so-called "render-to-texture" method, which is effectively deceptive in terminology. In reality, we're still rendering to the color buffer and then we have to copy the color buffer contents explicitly to a real texture with a call to glCopyTexImage2D(), post-process that texture, and then apply it to the screen sized quad for rendering into our on-screen viewport. glCopyTexImage2D() proves to be such a heavy stress to the GPU that whenever I try to rotate my model (regardless of its complexity) or drag it across the screen, let alone animate its rotation, both my GPUs go up to 95%+ load.
It may be not so obvious at a first glance but no matter what we do with our current camera, we aren't able to access at least one third of 3D space and its perspective we're "seeing" on our monitors. We aren't able to tilt (in fact, "roll") the camera around the world's Z axis. For the people who are aware of what's going on on their screens when they rotate a model, it's an annoying and massively stressful factor.

My proposal is as follows:

Instead of vertex buffer arrays (VBA's) we henceforth shall use vertex buffer objects (VBO's) that are effectively VBA replicas in the GPU video memory. Once uploaded on the GPU, VBA's may be safely freed. They aren't going to be pumped along the data bus any more. VBOs are toggled directly on the GPU, and the resultant bandwidth (in other words, the model size and/or equivalent FPS rate) may increase several times.
Instead of color buffer and "render-to-texture" we henceforth shall use the frame buffer object (FBO) that's effectively a kind of texture in the GPU video memory bound transparently as a persistent render target. The FBO supports all features of a real texture and as such wouldn't require extra stressful glCopyTexImage2D() calls. The FBO can be rendered to, post-processed, and mapped to the on-screen quad directly.
The entire camera code shall be re-written from scratch maintaining clear track of separate model, view, and projection matrices for every minute transformation of camera-to-model spatial orientation. This is to allow for easier and faster calc of distances, parallaxes, and mutual orientations of scene objects/meshes for more precise visibility clipping, dynamic alpha sorting, collision detection, etc. And it will make integration of shader code from different sources considerably easier too.

To this end, I am going to abstain, for the time being, from active development of the current version of ObjReader and engage myself in writing the skeleton of a new ObjReader implementation from scratch. Once the newer technology features described above prove to be working, you will take over, skin the app and merge your other GDImage goodies with the new implementation, and we'll enjoy a brand new ObjReader v3.0 that's up to date with modern OpenGL.

If we don't do that now, we'll keep on looking as two archeologist old farts digging in the remains of ancient immediate mode OpenGL civilization for the rest of our, alas, limited lives.

If you agree to my proposal, then also let me know if you can stand OOP implementations of certain parts and features of new ObjReader code, e.g. vector and matrix classes and maths and the like. I'm going to work directly in C++ omitting my usual FBSL ANSI C stage. I will try to follow the existing ObjReader's general layout but that certainly isn't going to be my primary goal or pattern. Speed and usability is what's in fact going to matter in the first place.

This doesn't mean you should likewise abandon your WIP on v2.0 or its beautiful 3D content and retire in your audio/video apartments. Everything that you have at hand by the time the v3.0 skeleton is ready will find its due place in the new code too.

Patrice Terrier · « **Reply #4 on:** January 29, 2018, 09:25:27 am »

My friend,

Who am i to tell you what to do.

The only thing i can say, is that i have a very limited knowledge of modern OpenGL programming. And so far i have done the best i could do with my expertise, meaning that i couldn't help, but only learn from you if you switch to this new paradigme

The very first version of this wavefront reader i started from, was written using OOP, and the first thing i did was to convert it to the procedural mode i have ever used, that says all...

Starting to write version 3 from scratch, is probably the best way to experiment with new features, rather than having to deal with the existing code.

I shall keep working on the interface of version 2 to make it more user friendly.
And rework or work on new 3D model(s) to provide a good food for our renderer.

Michael Lobko-Lobanovsky · « **Reply #5 on:** January 30, 2018, 06:15:19 am »

Great!

Probably I didn't make myself clear enough in my proposal. Our new VBO/FBO rendering strategy is not going to eliminate the immediate mode capability:
- We will be loading vertex/texture/normal/tangent/bitangent data into the VBA's as we did before. But we will not bind them to the OpenGL render target directly from the system RAM in our render procs. Instead, we will additionally upload them on the GPU for permanent storage in its video RAM in the form of VBO's and then we will free the VBA memory because we won't need the original VBA's any more. Thus we will eliminate pumping the model geo data along the system data bus in each frame render over and over again.
- OpenGL's default render target (color buffer) will be substituted with our FBO that supports all features of a custom texture, allows direct writes at render and post-processing times, and can be bound to the screen sized quad directly as a texture to finally render the scene onto the screen. The sequence (and purpose!) of OpenGL commands in our render procs is going to be pretty much the same as it is now. It is only the names of OpenGL functions that are going to be a little different.
- It is irrelevant for the VBO's and FBO if we draw using immediate mode commands (FFP) or advanced GLSL shading (PPL). They will work for the both -- but much, much faster and more precise than my current "render-to-texture".
I will try and keep the number of new OOP classes to the absolute minimum -- just 3D vectors, 4x4 matrices, and probably the camera itself. The ability to use overloaded operators in object-oriented vector and matrix maths is a blessing as compared to procedural C operations on raw floating-point arrays. Aren't you happy with using std::strings in your own code after all, my friend?

You aren't going to be thrown overboard, Patrice, no way. Your expertise in immediate-mode OpenGL is going to serve us well into the future.

Patrice Terrier · « **Reply #6 on:** January 30, 2018, 09:50:35 am »

Quote

Aren't you happy with using std::strings in your own code after all, my friend?

All std::strings has been removed from my latest creations, i keep using them only inside of my first PB to C++ translations like ObjReader.

Since then, i am using my own WCHAR subset fo the purpose of reducing the size of my binary dependencies.

But i am not obtuse, and i can live with a little OOP.

However i will always bypass the GDIPLUS class, and keep using the flat API that serves me well since 2002

Michael Lobko-Lobanovsky · « **Reply #7 on:** February 08, 2018, 06:43:58 am »

No FBO for post-processing nor new camera yet but
!!! VBO's are already there !!!

( [UPD] My continuous render-mode default-size window GPU is ca. 40% but CPU and data bus are 0% overall. It's effectively nil, null, naught; an immeasurable infinitesimal! Rotation is perfectly smooth as if there were just a couple of polies to render... )

Patrice Terrier · « **Reply #8 on:** February 08, 2018, 09:38:39 am »

Well done my friend !

I thought you were too busy on other tasks, and that you have moved VBO/FBO onto your back burner

When playing with large models, like "Event Horizon" or "AB" i sometimes see some of the meshes not being displayed.
I wonder if the changes done into calcMeshBounds() and isSphereInFustum() could explain this ?
With AB sometimes she loose her shoes or her bandana

While running in full screen mode, i would like to add some more handy informations into the SL display, like the CPU and GPU % charge.
Do you know if there is some easy way to retrieve these values without hogging the CPU ?

Added
Does that nVIDIA GPU % would work by you?

Code: [Select]

//
// Getting Nvidia GPU Usage
//
// Reference: Open Hardware Monitor (http://code.google.com/p/open-hardware-monitor)
//

#include <windows.h>
#include <iostream>

// magic numbers, do not change them
#define NVAPI_MAX_PHYSICAL_GPUS   64
#define NVAPI_MAX_USAGES_PER_GPU  34

// function pointer types
typedef int *(*NvAPI_QueryInterface_t)(unsigned int offset);
typedef int (*NvAPI_Initialize_t)();
typedef int (*NvAPI_EnumPhysicalGPUs_t)(int **handles, int *count);
typedef int (*NvAPI_GPU_GetUsages_t)(int *handle, unsigned int *usages);

int main()
{   
    HMODULE hmod = LoadLibrary(L"NVAPI64.dll");
    if (hmod == NULL)
    {
        std::cerr << "Couldn't find nvapi.dll" << std::endl;
        return 1;
    }

    // nvapi.dll internal function pointers
    NvAPI_QueryInterface_t      NvAPI_QueryInterface     = NULL;
    NvAPI_Initialize_t          NvAPI_Initialize         = NULL;
    NvAPI_EnumPhysicalGPUs_t    NvAPI_EnumPhysicalGPUs   = NULL;
    NvAPI_GPU_GetUsages_t       NvAPI_GPU_GetUsages      = NULL;

    // nvapi_QueryInterface is a function used to retrieve other internal functions in nvapi.dll
    NvAPI_QueryInterface = (NvAPI_QueryInterface_t) GetProcAddress(hmod, "nvapi_QueryInterface");

    // some useful internal functions that aren't exported by nvapi.dll
    NvAPI_Initialize = (NvAPI_Initialize_t) (*NvAPI_QueryInterface)(0x0150E828);
    NvAPI_EnumPhysicalGPUs = (NvAPI_EnumPhysicalGPUs_t) (*NvAPI_QueryInterface)(0xE5AC921F);
    NvAPI_GPU_GetUsages = (NvAPI_GPU_GetUsages_t) (*NvAPI_QueryInterface)(0x189A1FDF);

    if (NvAPI_Initialize == NULL || NvAPI_EnumPhysicalGPUs == NULL ||
        NvAPI_EnumPhysicalGPUs == NULL || NvAPI_GPU_GetUsages == NULL)
    {
        std::cerr << "Couldn't get functions in nvapi.dll" << std::endl;
        return 2;
    }

    // initialize NvAPI library, call it once before calling any other NvAPI functions
    (*NvAPI_Initialize)();

    int          gpuCount = 0;
    int         *gpuHandles[NVAPI_MAX_PHYSICAL_GPUS] = { NULL };
    unsigned int gpuUsages[NVAPI_MAX_USAGES_PER_GPU] = { 0 };

    // gpuUsages[0] must be this value, otherwise NvAPI_GPU_GetUsages won't work
    gpuUsages[0] = (NVAPI_MAX_USAGES_PER_GPU * 4) | 0x10000;

    (*NvAPI_EnumPhysicalGPUs)(gpuHandles, &gpuCount);

    // print GPU usage every second
    for (int i = 0; i < 400; i++)
    {
        (*NvAPI_GPU_GetUsages)(gpuHandles[0], gpuUsages);
        int usage = gpuUsages[3];
        std::cout << "GPU Usage: " << usage << std::endl;
        Sleep(250);
    }

    return 0;
}

Michael Lobko-Lobanovsky · « **Reply #9 on:** February 08, 2018, 02:42:05 pm »

Quote

Well done my friend !

Thank you Patrice!

Quote

I thought you were too busy on other tasks ...

I am from time to time, but it doesn't mean I'm out or I ain't interested any more.

Quote

... i sometimes see some of the meshes not being displayed ...

Not any more my friend, not any more! Probably you wouldn't believe me until seen with your own eyes but now flipping/dragging "Event Horizon" or "AB" vigorously across the screen is as easy as a piece of cake, as if they were my Q3Torch or your first OpenGL colored triangle application.

Quote

... calcMeshBounds() and isSphereInFustum() could explain this ?

Mesh visibility is indeed controlled by isSphereInFustum() but the artifacts are exclusively due to data bus stalls. The CPU is still waiting to calc it at a new mouse pointer position while OpenGL has already drawn the scene. OpenGL swaps its buffers in a parallel thread out of sync with the CPU regardless of glFlush/glFinish. My AB would usually lose her sunglasses and a coupla Cokes but never her Mars pack.

But not any more!

Quote

i would like to add some more handy informations into the SL display

Sounds like a great plan!

Quote

... easy way to retrieve these values without hogging the CPU ?

Just use your "nVIDIA GPU %" and a standard CPU % process counter lookup routine. They aren't going to hog anything if taken once a second. You would even have a chance to draw your own history graphs (

) without any visible impact on the IPS/FPS rate. I have also moved all function calls completely out of our render procs except the visibility check and pure OpenGL proper, so that the procs are just as fast as they can only be in C code without inline asm. 0% CPU on Onyx, Patrice, and I really mean it!

Quote

Does that nVIDIA GPU % would work by you?

Of course it does! Tell you more: I also own a full AMD/ATi Radeon based box running under x64 Windows 7 that's able to cope with ObjReader as well. So if you could find a similar routine for use with a basic ATi GPU driver, we would go real cool with ObjReader's GPU auto detection. (don't forget to reserve some space for small and neat nVidia/ATi icons )

Now, can you wait a couple of days more until I get FBO up and running too before I send you my mods? The new camera will come the last.

Michael Lobko-Lobanovsky · « **Reply #10 on:** February 08, 2018, 05:43:09 pm »

Brilliant!

Re: GetSystemTimes()

That's cool but more suitable for a profiler. But there are also process counters accessible through the Windows registry IIRC that would yield per-core and total CPU load readings in real time as they are depicted in the Windows Task Monitor. Would be cool to have those too to match the GPU readings.

Any luck with Radeons yet?

Patrice Terrier · « **Reply #11 on:** February 08, 2018, 07:31:02 pm »

I had fun, thank you very much for this preview.

Here is what i get on my computer using "Event Horizon"
IPS 33 (perfect)
CPU 0% (perfect)
GPU ranging between 27 and 52%, and most of the time at 32%

Mesh visibility problem is still there, see the attached screen shot.
I have disabled isSphereInFrustum in my current WIP, i shall see if that makes any difference once i am done with my current work on CPU/GPU display (visible only in full screen mode from the SL display).

I think that this new version will be a killer's one

Quote

Any luck with Radeons yet?

So far i haven't found anything, but i did just a quick search...

Michael Lobko-Lobanovsky · « **Reply #12 on:** February 08, 2018, 08:18:15 pm »

Quote

GPU ranging between 27 and 52%, and most of the time at 32%

In fact, that's very good. All activity now takes place on the GPU as planned. There must be something working at full thrust after all; there's so much to be done in each frame render especially on such a huge model as that.

Quote

Mesh visibility problem is still there ...

It's related to the inexactness of bounding sphere radius projection calc, very similar to my PITA with the initial versions of AA billboards. The longer one of the axes is relative to the two others, the larger the error. I'm hoping to see it cured in the upcoming camera though. That's one of the main reasons I've been advocating it in the first place.

Quote

... i shall see if that makes any difference ...

Oh yes, it does, and especially in a multi-mesh model with complicated geometry. You don't need counters to prove it. Just listen to your GPU fans and watch the jerkiness of your mouse movements.

Quote

... once i am done with ... CPU/GPU display ...

That won't be too soon. I have my own idea of usefulness and content of such a display.

And I'll look into the ATi Radeon capability myself.

Quote

I think that this new version will be a killer's one

I hope so too. I'd love to see it doing all that it presumably can.

Michael Lobko-Lobanovsky · « **Reply #13 on:** February 12, 2018, 12:25:20 pm »

Below please find my precompiled WIP ObjReaderVBO.exe. Its distinguishing features are:

isSphereInFrustum() is currently disabled not to irritate you till it's perfect. But be forewarned that eventually it will be there or I'll quit this project!
gl_DrawScene() is now yet substantially faster because its entire glBlaBla() intro has been precompiled in a draw list.
Whenever applicable, all turtle slow sqrtf(), 1/sqrtf(), fabs(), fmod() occurrences have been substituted with their much faster integer math approximations. Tell you what my friend: in fact, OpenGL needs only as much as half-float precision. All extras are but unpardonable squander.
Your IPS has been persistently off the point by 2 frames per second on low FPS counts. I've fixed that glitch though I'd still rather see it re-written to comply with industry tradition.

Enjoy!

Patrice Terrier · « **Reply #14 on:** February 12, 2018, 03:12:12 pm »

The new VBO wip seems to work great !!!

Using Event Horizon
IPS 32, GPU average 30% , CPU average 3%

all meshes preserved

See the attached video.

You did a tremendous work my friend...

News:

Author Topic: Post-Processing (Read 86709 times)

Michael Lobko-Lobanovsky

Post-Processing

Michael Lobko-Lobanovsky

Re: Post-Processing

Michael Lobko-Lobanovsky

Re: Post-Processing

Michael Lobko-Lobanovsky

Post-Processing: Further Strategy

Patrice Terrier

Re: Post-Processing

Michael Lobko-Lobanovsky

Re: Post-Processing

Patrice Terrier

Re: Post-Processing

Michael Lobko-Lobanovsky

Re: Post-Processing

Patrice Terrier

Re: Post-Processing

Michael Lobko-Lobanovsky

Re: Post-Processing

Michael Lobko-Lobanovsky

Re: Post-Processing

Patrice Terrier

Re: Post-Processing

Michael Lobko-Lobanovsky

Re: Post-Processing

Michael Lobko-Lobanovsky

Re: Post-Processing

Patrice Terrier

Re: Post-Processing