Summer is here, and because of some unforeseen events I’m stuck inside while the sun is shining working on an assignment that we had earlier this year. I’m not going to bring up the why part here, that’s not relevant. What I’m going to do is to try and put up a blog post every time I get a revelation or as an attempt to reflect upon what I’m doing and to be able to do that, the best way for me is to try and explain something as if I’m explaining that to someone else. This is something that I do for myself, so if there are someone who’s interested and want to know more let me know, but for the most part it is about explaining the basics and writing it here so that I too can get a better understanding of what’s going on, because most of the time while I’m writing these blogs I get a revelation and understand more of the parts.

I’m going to start with the absolute basics and hopefully by the end of the summer I will have a complete explanation of how everything Directx works from a to z. We’ll see though, this is not something I have planned at all, but I needed to take a break so that I can start reflecting upon it all.

The basics of 3D

Meshes, faces and vertices

Meshes are composed of faces. Faces in turn are composed of vertices. A vertex is a single point. So let’s take a look at the vertex. The minimum amount of data a vertex can have is an x and y position. In 3D space however you need to know the z position as well. In directx 11 a face consists of 3 vertexes, because that’s what you create a triangle from and with triangles you can build any shape you want, even circles. The more triangles you have in a circle the more rounded it will seem. Meshes in turn are just triangles that forms an object, it could be part of an object or it could be the whole object, but it is at least composed of triangles in one shape or another.

Now of course you can define more data into your programmable vertex, for instance you might want to have texture coordinates. Explaining texture coordinates isn’t that difficult, basically let’s say you have a basic box which is 1 cm. Let’s say you want to have a texture on all the sides of that box. Let’s say you want the box to look like a brick, you then have maybe several textures in one file. Reasoning behind this I will explain later, but at least let’s say that all the sides have the texture in one file and you then want to locate on that texture where the different parts of that brick is located. So you define where in the x and y coordinates based on values from 0 to 1 (0 % to 100%)  that you want to fetch the texture from. These coordinates in direct x goes from left to right, where left is 0.0 and right is 1.0. Then you have top to bottom that follows the same principle that 0.0 is the top and 1.0 is the bottom.

So here’s the “problem”, let’s say the front texture goes from (x) 0.0 and (y) 0.0 to (x) 0.5 and (y) 0.5, that means it is a square, but from what I’ve told you above all the faces are made up of triangles. So what do we do? So let’s assume we are working on the front like the image below:
box and brick

Each corner on the right hand side (the different colors) represents a texture. But we want to have it onto a triangle. What we need to do is to create a triangle out of the texture we want, so in this case the texture coordinate should be in (x, y): 0.0, 0.0 , 0.5, 0.0 and 0.5, 0.5 (that’s the top left square, and the top right triangle in that square). So that’s for the one, for the other it would be 0.5, 0.5, 0.0, 0.5 and 0.0. That said it all depends on which way you have defined your triangles as well. if you didn’t choose the top left triangle first the texture coordinates would look different.

Anyway it is a way to sample the textures that you want to use and then put it onto the 3D object. This same example goes for normal mapping as well, as a generic principle but the usage is as far as I know a little bit different in that you need to include shaders. That however I will talk about later when I have looked at and understood that better.

Coordinate systems – Left or Right handed

When I first heard the term left handed or right handed coordinate system, I saw the teacher put up her hand and do some kind of gesture to indicate what was what. At the time I didn’t see how she was holding her hand up, so when another teacher did the same I didn’t quite get the reference. Now I might be slow, but afterwards when I figured it out I just started laughing because it’s so simple and brilliant at the same time.

Basically a normal coordinate system uses X where the center is 0,0, in 3D you can choose between having the Z axis going inwards into the screen so to speak, which basically is what a left handed coordinate system is. The Z axis is negative for values that goes toward the user and positive when it goes into the screen. (If you haven’t done so already try making an L with your thumb and index finger with the middle finger going outwards, with the left hand and do the same with the right hand and you see that the middle finger points towards the positive Z value, so to speak – google it, you’ll get it 😉 )

Anyway that’s the whole thing, Z value positive or negative in the different directions, other than that the coordinate system is the same.

Cameras, view space, objects space, a whole lot of transformations

Alright once you know what a mesh is, what you need to know is that unlike 2D where you can have absolute positions on the screen there’s a lot of stuff that is vital to know. The first thing that caused me a headache was that I didn’t see the difference between 2D and 3D, certainly all objects needed to have a position in the world and that you could specify that position as easily as you do with 2D. While that is to a degree true, it’s very much different.

The first thing that was problematic for me to understand was the object space. Object space is where you create the different meshes that you want to use, however at this point they are only created, they don’t really have a position in the 3D world, the object, let’s take the box as an example, are just drawn up, its location on the other hand is not defined in that object space, because the GPU is handling the calculations for its scale and its appearance. It could be more fancy than a box, but the object space is what create the look of the object you want to put out in the 3D world.

World space is another one of these terms that I was unsure about, but basically the world space is what has a relation between the different objects (or meshes if you like). For instance let’s say the bricks that we want to create form a wall, the different bricks would have a relationship that is on top of each other as a relation.

View space is the relationship between the user or the camera, so if the view space moves, the camera moves as well. Well of course that’s not entirely true, what happens is that the cameras view is transforming all the other objects. So the camera is basically just standing still while the rest of the world is moving around, this you do by using the camera and then applying movement based on the camera view onto all the other objects in the world. So in a way you can say that the world is moving while the camera is standing still.

World transformation matrix, this is the matrix that once you have an object in the object space, you transform the objects into the game world by using the world transformation matrix, basically you can say that this transformation is what sets the objects location in the world. In other words this is the transformation matrix you want to use to set the objects initial location.

The lastly what I know of you have the view transformation matrix, this matrix is what makes the objects move around, it is the view camera so to speak. Normally when thinking of a camera you would think that the camera would move around while the objects are left in their position, however that is not really the case. What you normally do is setting the camera to a position so that position is fixed, then you would apply the inverse transformation from the camera. In the case of moving forward if you move the camera on the positive Z axis, you would then apply the inverse values to the objects in the world making them move instead of the camera.

Lastly as far as I know you have the projection transformation matrix. This is where you set the aspect ratio, that much I know, you also set how far you can see and the projection from a certain point of view. If you imagine that you have a pyramid, chop the top off, then you have the projection. I’m not entirely sure, but logically it should be that objects that are closer have a certain height and width and the further back they are they get stretched according to the aspect ratio. This however I’m not a 100% sure of, and this is currently where I’m at.

I’m going to end this blog here, if someone reads it and I’ve made a wrong assumption or stated some facts wrong, please point them out so I can correct it. Next week I hope to begin with the essence of what is directx. I should hopefully by then fully have understood the rendering pipeline, how the different stages are, how the relationship between shaders, matrices and the final object output is.

If there’s any questions, ask, although I’m not a 100% sure I can answer all of them.