Unsolved Optimization of mod, Beam T3D vs Unity and others, differences? (has lots of info)

Discussion in 'Mod Support' started by fufsgfen, Sep 5, 2018.

  1. fufsgfen

    fufsgfen
    Expand Collapse

    Joined:
    Jan 10, 2017
    Messages:
    6,782
    Terrible name of the topic, but I could not figure out better, also while writing this, I'm starting to realize this is huge topic, but to gain and share knowledge, discussion is helpful.

    I guess this applies mostly to maps, I'm uncertain if vehicles differ too much though, but at least batching might not be much use for those.


    So there are some universal ways, methods, techniques to use when creating content that will yield better performance, but then there are some unique to our engine.

    For Unity they have this kind of page https://docs.unity3d.com/Manual/OptimizingGraphicsPerformance.html
    Also very interesting page from there is this:
    https://docs.unity3d.com/Manual/DrawCallBatching.html

    We have this and not much else about the subject or I have not found something?
    https://wiki.beamng.com/Creating_static_objects

    Based on my experiments and discussions, I would think that Beam T3D does batching for forest items.

    For static objects, I'm not so sure, because using same object with same texture setup, I'm getting huge difference in CPU utilization with 3000 or so objects, TSStatic are much more CPU hog than same as forest items.

    So that is obviously one difference, static objects are not batched at least as much as forest items, while unity does not have forest items in first place.

    Obviously there are more difference on batching, there is not much information around, but I guess general principles of making objects 'batchable' applies from Unity documentation?

    However are there some caveats? Like something that works for them, but for us it would harm performance?

    Batching is really important to know well, but so far I only know forest items of identical type and same texturing get batched, which can save huge amount of CPU time. How much applies to cars or static objects though, are anything of those batched?

    Things that likely don't apply:
    -Lightmaps
    -Anything about mobile
    -Lighting?
    -Shaders???

    Another place I read something is here:
    http://wiki.polycount.com/wiki/Framerate_Optimization_Tips

    Now I'm not sure if fillrate part applies to us at all, also alpha I'm not sure of, our engine uses alpha for reflections for first.
    Also decal roads use transparency, it can be visible on GPU graph if using really lot of decal roads, but I haven't seen really much of problems from that.

    More polys can indeed be faster, but in some cases slower. What I'm not certain is that how batching among LODs works, if object has 3 LODs and you have 33% objects on each 1 lod, is that considered as 3 batches or 1 batch?
    So as too many different low poly objects are bad for CPU budget, same with lowest LODs?


    There there are these kind of pages which seems to make sense, but not sure how well it all applies to our Beam T3D:
    http://wiki.polycount.com/wiki/Texturing
    http://wiki.polycount.com/wiki/Texture_atlas

    Texture combining and using texture atlases sounds reasonable, after reading all that, but then there was mentioning of shaders too in several places, these are in materials.cs specular, normal etc. I believe, not sure what their effect is to performance and what really can be done, I would think those are pretty much set by engine and that is that or is there some tricks to keep their impact low?


    But then there is more, just one example of DX11 material I stumbled upon was this tiled resources article:
    https://blogs.windows.com/windowsex...rces-enables-optimized-pc-gaming-experiences/

    Terrain itself I have seen being sort of tiles, but that might be different, not sure if there is way to use textures like in above link they are using?


    Then there is Occlusion volumes, that work only on TSStatic objects and for terrain, which can be used to block rendering, from these I have not seen so much on web and I'm not sure if those help with GPU side or also with CPU side of things.


    Balancing is then how one chooses to balance things, I find it is possible to use more GPU or more CPU, depending a bit of choices one makes, but all techniques only help to run design.

    Design defines much, design of what player usually can see, using clever terrain shapes and choices of objects to limit amount of work hardware has to do while bringing immersion.

    Hmm, lots of stuff and hard to organize it better, but maybe more knowledge can be gained to understand better how to get best out of the engine, so share your thoughts :)
     
    • Like Like x 3
    • Informative Informative x 1
  2. fufsgfen

    fufsgfen
    Expand Collapse

    Joined:
    Jan 10, 2017
    Messages:
    6,782
    Ancient old stuff, that still is news for some (like me), part 2 especially might be useful:
    http://www.ericchadwick.com/examples/provost/byf1.html
    http://www.ericchadwick.com/examples/provost/byf2.html

    Stumbled upon while reading this topic that has good information, even it is more than 10 years old now:
    https://polycount.com/discussion/50...-optimisation-do-polygon-counts-really-matter

    Think about how 10 years ago 200 polys was minimum worth sending to GPU, how much more it will be today, 2000?

    This all is very useful, because planning what shares textures is more important than ever because game engine can put only vertexes sharing texture to batch.

    So, one can think about most likely views player will see and put lots of same view to same texture atlases, which can give much better reduction than putting for example same textures of far apart and not on same screen to one texture atlas.

    It takes lot of planning to get best out of the engine.
     
    • Like Like x 1
    • Informative Informative x 1
  3. fufsgfen

    fufsgfen
    Expand Collapse

    Joined:
    Jan 10, 2017
    Messages:
    6,782
    So I draw this, which I hope is correct more or less at least, general concept of thinking in objects is wrong in it, actually as GPU sees only surfaces that share same texture. Please point out if there is errors as I'm just learning this stuff, there can be imperfections.

    Staying below 3000 in 13.0.5 version of BeamNG is something one should aim for as that many buckets modern fast intel CPU can carry over to GPU, staying at around 2000 or so allows much more cars to be run at the same time and much better usage of wider range of multicore CPUs.
    upload_2018-9-8_10-45-42.png

    I did read that normal maps etc. are adding up, so I came up with this:
    upload_2018-9-8_10-49-42.png

    Then this I think is especially where I should use surfaces instead of objects? But anyway it kinda shows how big impact can be.
    upload_2018-9-8_10-50-20.png

    Then polygons, that is how full those buckets are when they are carried over to GPU by CPU, at 2007 something like less than 300 polygons per bucket was limit which below no gains at all, today it is much higher, but I haven't been able to find any numbers, however my earlier post had links where they did talk long time ago about how optimizing LODs can be less gains than one excepts. Reason for no gains is that each bucket will be quite empty, it is still same amount of buckets that will be possible to carry, so more you can fill buckets, more it will be possible to have GPU at full use.

    However Batching is what Beam does at least with forest items, which can be used to get more full buckets with lower lods, there are many questions still regarding those, which seems to be enough difficult for not many to know answers to.

    Anyway, 5 000 000 or so polygons at what is visible in game should be easy for 1050Ti, some 10 000 000 polys for 1070 running SSAO and highest graphics should be fairly possible to keep 60fps and 1080p, afaik.

    Vram usage, trying to stay below 4GB if aiming for 5 000 000 max polys or less, if going higher polys, target GPU will be 1060 or faster and vram limit goes up.

    Then graphics settings and how to have best scaling with them is subject I'm hitting blank, lower texture setting, does it need something to know from the map maker to work best?
     
    • Like Like x 1
    • Informative Informative x 1
  4. fufsgfen

    fufsgfen
    Expand Collapse

    Joined:
    Jan 10, 2017
    Messages:
    6,782
    New stuff again, this time tool to analyze your map etc. Here is nice article on how it was used in one game:
    https://software.intel.com/en-us/ar...t-cause-3-on-systems-with-intel-iris-graphics

    You can download tool from here:
    https://software.intel.com/en-us/gpa

    It can be used to examine lot of games actually, how they draw graphics, analyze bottlenecks, see drawcalls in any DirectX game etc. Very useful tool, which takes bit time to learn how to use (I still don't know much, just learning slowly).

    Also beware of tracing function, it needs a lot of disk space, 20GB disappears like in a moment.

    Best gains from tool comes if you study earlier links and how batching works etc. It takes time to take all that in, but definitely improves understanding of how to build things and why they are built like they are etc.
     
    • Like Like x 1
    • Informative Informative x 1
  5. fufsgfen

    fufsgfen
    Expand Collapse

    Joined:
    Jan 10, 2017
    Messages:
    6,782
    Here is new bit, which could be of course found from earlier links, but this is quite important one, it is discussion about how UV maps and texture atlases are beneficial to performance:
    https://polycount.com/discussion/174622/uv-mapping-and-performance

    Today 4GB starts to be what GPUs have at mininim, 6GB being most common if I don't remember wrong from Steam survey, so there is more room for large textures now than before.

    Surely there still is 3GB 1060 and 2GB older cards around, but when designing map, I think 1060 6GB or at minimum 1050Ti 4GB would be target for High graphics, maybe even higher as lowering details helps to run map with weaker cards and lower detail can be made look better too.


    I have been trying to figure out some polycount targets based on 0.13 version of BeamNG and two cards which I have tested have this kind of range:
    For gtx 1080, highest polygon count can be 20-30 million (with SSAO)
    For gtx 1050Ti 6-9.5 million (without SSAO)

    It depends much from how much post processing etc. there is going on, for example simple non textured polygons without much of any postprocessing cards can process a lots more that those listed above.

    To be safe, lower end of the scale can be used.

    Of course as game progresses these might change, but to give some kind of idea how much you can except different GPUs to handle with 1080p resolution, there is some kind of range.

    WCUSA is around 10 million polygons, bit higher if I don't remember wrong, Automation test track was around 8 million.

    Often I see people being bit afraid of too high Polygon count, but that is not really much of an issue, also remember that GPU polygons are those from the UV map, not from the mesh itself, your UV maps can affect a lot of how well map runs, large islands/shells are better than lot of tiny ones even for drawcalls.

    Hope this helps someone some day.
    --- Post updated ---
    Oh and this is largely still unsolved, there is a ton of stuff I don't know yet, maybe it should be a wiki page or something, feel free to create something out from this mess if you want :p
     
    • Like Like x 1
    • Informative Informative x 1
  6. fufsgfen

    fufsgfen
    Expand Collapse

    Joined:
    Jan 10, 2017
    Messages:
    6,782
    Something again, Triangle counts in various games I stumbled upon:
    https://polycount.com/discussion/126662/triangle-counts-for-assets-from-various-videogames

    Take note of the years each game were made, increase in polycounts has been quite big, cars in Forza are quite high poly, using such in BeamNG can be tricky, but polycount is not to be afraid of.


    Update:
    R9 can do over 4 billion triangles per second, that is probably pure triangles without texturing etc.:
    "Raja Koduri, AMD’s corporate vice president of visual computing, said that there are three technology pillars for the R9 290 series; the GCN (Graphics Core Next) series, UltraHD (4K resolutions), and audio. The R9 290 series is the first to power DirectX 11.2 games, and offers a whopping 5 GFLOPS of compute power, Koduri said. AMD also designed for lower-power efficiency: this architecture is capable of scaling from 1-watt devices to 1-kilowatt workstations, he said.

    The compute capabilities is complemented by over 300Gbytes/s of memory bandwidth, about 20 percent more than 2002’s Radeon 9700. The memory is necessary to render high-resolution content: 100 layers of complex rendering at 4K resolutions, he said. The chip renders over 4 billion triangles per second. “We have done this because we see the trend of incredibly high-resolution games coming our way this holiday season and next year.”
    "
    https://www.pcworld.com/article/2049397/amd-unveils-hawaii-generation-of-gpus.html

    CPUs tend to limit games more than GPUs.

    But remember that for good looks polygons alone are not enough, skimping on them helps nothing, wasting does nothing either, normal maps etc. has to be good too and those eat performance easily more than polygons.
     
    #6 fufsgfen, Nov 5, 2018
    Last edited: Nov 5, 2018
    • Like Like x 1
    • Informative Informative x 1
  7. LuisAntonRebollo

    LuisAntonRebollo
    Expand Collapse
    Developer
    BeamNG Team

    Joined:
    Feb 25, 2014
    Messages:
    117
    Thx for your interest and contributions, wow this is a lot of research work :)

    Im currently working on Italy "performance issues" and some of your comments are very familiar to me :D
    At this moment im focused on reduce the number of total draw calls and make those cheaper to execute on cpu. Italy is heavily cpu bound on our initial tests.

    Render performance is a complex topic and is a good idea to have a proper documentation on how to create content than looks amazing and is fast to render.
    We will write a new documention with all the information we learned from Italy project, i hope will be a lot of help for out community.

    I will post more information later at some point.
     
    • Like Like x 3
  8. fufsgfen

    fufsgfen
    Expand Collapse

    Joined:
    Jan 10, 2017
    Messages:
    6,782
    Thank you from your reply :)

    It is interesting area of computer science as there seems not to be too much of formal education and every game engine seem to have quite differences, with some similarities with it.

    As I'm obsessed with performance, it has been my curiosity to find out why some mod maps perform less than excepted, quest that has some answers found, but always there is some stones to turn it seems.

    Going from i3 to 8086k via i7 and iGPU to gtx 1080 via gtx 1050 Ti has also revealed a lot.

    Keeping modern GPU feed is something every game maker have to wrestle a bit it seems, it just is how insane GPU performance increase has been during the years and graphic artists has had to learn new tricks.

    I'm looking forward of your guide, that will be surely very good for BeamNG as is anything that will help more people enjoy from the game. Certainly it will be better than my chaotic output which can be rather hard to follow, export data >> World, is not exactly the most best yet only I can :p
     
    • Like Like x 1
  9. bob.blunderton

    bob.blunderton
    Expand Collapse

    Joined:
    Apr 3, 2015
    Messages:
    3,289
    Yes, this, totally this, would appreciate the info on the topic very highly.
    Right now trying to design a city map without documentation on what batches, what doesn't, what kills performance, what does not, is about as much as much fun as sticking your [CENSORED] into a wasp's nest that you've already smacked around a few times. If my comment is offensive, please remove it (or I will if someone complains), but you get my point, it's the closest analogy I could think of.

    Warm Regards
    --The Bob.

    Come to think of it, that might cure my back pain...
     
  10. LuisAntonRebollo

    LuisAntonRebollo
    Expand Collapse
    Developer
    BeamNG Team

    Joined:
    Feb 25, 2014
    Messages:
    117
    For now, i can say the new optimizations should make any map render faster if it is cpu bound.
    TSStatics are a lot faster to render now, no difference anymore with ForestItems.
    Avoid as much as posible multilayer materials, are very expensive to render.
    Try to reduce the number of materials used for 1 model when posible, and LODs should reduce triangles and materials count.

    Now im working on make the new code stable, sorry no time for documentation :(
     
    • Like Like x 1
  11. bob.blunderton

    bob.blunderton
    Expand Collapse

    Joined:
    Apr 3, 2015
    Messages:
    3,289
    Gracias Luis
    Thanks for the effort/feedback & 411. I am currently (and have been for months) working on reducing disconnected UV's, draw calls, and amount of materials per object, especially on LOD's.
    Hope one day for multi-threaded draw calls & possibly cross-object batching (could be defined as groups within the forest brush, or defined within 'zones' in the F1 object editor where any items in the same forest brush 'group' will combine meshes pre-run-time, though one would have to be careful as rendering a large map full of these objects could be painful to the game engine without using said zones, just food for thought). Bethesda Gamebryo (fallout series, oblivion/elder scrolls series) uses a method to combine objects that use the same textures pre-runtime in the map editor, this way all objects within the same cell or quad (quarter of a cell, not too much different then the 256x256 squares the terrain subdivides into from 4k x 4k or however big they are). This way the game renders all similar rubble piles, house walls, etc, in one go VS having to do it one by one (like when scrap mods are used in Fallout 4, it disables "use pre-combined objects" setting in the ini files and then the game lags and may even run out of available draw calls). If you would like to speak with me on this, let me know. Use-cases vary by map, but all in all, it will make the game-engine much more viable in the eyes of other game development studios. Think of all that sweet cash from a place like Rockstar or EA buying a license to your game engine. I have some experience with other games, and there's not many games that I have modded where I haven't pushed the engine to it's limit, even if just to see what happens.

    When you have a free moment Luis I will invite you to a private conversation I have going with a few of the staff of my up-coming map, a city map 'Los Injurus', you can use it and Roane County to test anything with draw calls in the game engine if you're working on code for rendering at all (or give it to whoever is). No reply is needed, but thanks for your efforts, it does make a difference!

    --Cheers!
     
  12. fufsgfen

    fufsgfen
    Expand Collapse

    Joined:
    Jan 10, 2017
    Messages:
    6,782
    • Like Like x 1
  13. RobertGracie

    RobertGracie
    Expand Collapse

    Joined:
    Oct 15, 2013
    Messages:
    3,779
    Its good to see someone providing more information, if I didnt say better I would say Fufsgfen is an honorary "staff" member since hes sorta all over the place to provide help..
     
    • Agree Agree x 1
  14. bob.blunderton

    bob.blunderton
    Expand Collapse

    Joined:
    Apr 3, 2015
    Messages:
    3,289
    Without Fuffy my city map would run 2x worse than dirt. Multi-layered, non-batching types of dirt. The only thing worse than a dirt-covered car is a car covered in laggy dirt.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice