22
May

How A CPU Works (Hardware + Software Parallelism)



hi thanks for tuning into singularity prosperity this video is the third in a multi-part series discussing computing in this video we'll be discussing classical computing more specifically how the CPU operates and CPU parallelism in the previous video in this series we discussed the shrieking of the transistor allowing for more powerful and efficient computers as well as the end of Moore's law based on the miniaturization of the transistor within the next seven to ten years be sure to check it out for some background context for this video now in that video when referring to computing performance we were focused on classical computing based on the CPU classical computing is essentially the digital computer almost every computing device on the market today is a classical computer classical computers operate in serial and other words as mentioned in the first video in the series computing origins executing various instructions extremely fast in order but to the average user appears to be writing them in parallel meaning multiple instructions at a time this is due to many hardware and software optimizations to allow for asynchronous operation by the end of this video the distinction between parallel and asynchronous operation will become clear but first let's see how classical computer works as a disclaimer the concepts we will be discussing are an over generalization of computer architecture but for the sake of getting an abstracted understanding of classical computing functionality will serve well alright so first let's bring in a central processing unit the CPU is the brains of the computer that's also bring in memory the RAM this is where the CPU accesses stored information it needs now the CPU also has built-in memory this is called a cache the cache is considerably smaller than the RAM with sizes ranging in the order of 32 kilobytes to 8 megabytes the purpose of the cache is to give the CPU the information it needs immediately the CPU and RAM are separate objects so when the CPU needs information it takes time all being a very small amount of time to read the data from the this time that considerable delay to computer operation with the cache being right on the cpu reduces this time to almost nothing the reason why you don't need much cache storage is because it just needs to store little bits of important information that the CPU will need to use soon or has been using a lot of recent leaks there are various methods implemented to determine what goes on to the cache what should be kept on the cache and when it should be written back on to the RAM and a typical CPU there are various levels of cache each with different read and write times and sizes for the sake of simplicity we'll assume a single cash for our CPU so now with the basic components out of the way let's get into how the computer operates when a CPU executes an instruction there are five basic steps that need to be completed fetch get the instruction from the memory and store in the cache in some cases decode that the appropriate variables needed for the execution of the instruction execute compute the result of the instruction memory for instructions that require a memory read write operation to be done write back write the results in the instruction back into memory nearly every instruction goes through the first three and final step only certain instructions go through the memory steps such as load in stores but for the sake of simplicity we'll assume every instruction requires all five steps now each step takes one clock cycle this translates to a CPI clock cycles per instruction of five as a note most modern processors can execute billions of clock cycles per second for example a 3.4 gigahertz processor can execute 3.4 billion clock cycles per second now a CPI of 5 is very inefficient meaning the resources of CP are wasted this is why pipelining was introduced bringing a synchronous operation into computing pipelining essentially makes it so each step can be executed in a different clock cycle translating to 5 instructions per 5 clock cycles or in other words one instruction per clock cycle a CPI of 1 essentially what pipelining does is take the segmented steps of an instruction and execute them in each clock cycle since the segmented steps are smaller the size and less complex than a normal instruction you can do the steps of other instructions in the same clock cycle for example if a step for one instruction is specially the data you could begin decoding another executing another etc since the hardware and fall for those steps isn't being blocked superscalar pipelines out of this performance further think of pipelines as a highway now typical Lane in the highway can execute one instruction per clock cycle with superscalar processors you add more lanes to the highway for example a 2 wide superscalar also referred to as a dual issue machine has a theoretical CPI of one half two instructions per clock cycles there are various other methods implemented to make the processor CPI more efficient such as unrolling loops very long instruction groups the Li WS which are essentially multiple instructions wrapped into one larger instruction compiler scheduling an optimization along four out of order execution and more there are also many issues that come along with pipelining that decrease CPI such as data hazards memory hazards structural hazards and more all these topics are beyond the scope of this video but mentioned to satisfy curiosity if you wish to know more about them so at this point we now know about the basic design of a CPU how it communicates with memory the stages that execute instructions in as well as pipelining and superscalar design now instead of imagining all of this as a single CPU let's take it further all this technology can be embedded on a single core processor with multiple cores you take the performance of a single core and multiply it by the core comp for example in a quad core by four multiple cores also have a shared cache as a sidenote the use of superscalar pipeline as well as multiple cores are considered Hardware level parallelism the computer industry after years of stagnation is now beginning to divert more focus to Hardware level parallelism by adding more closer processors this can be demonstrated by consumer processors like AMD thread River line and Intel's i9 processor aligned with core conservation from 8 to 16 and 10 to 18 respectively while these may be their higher end consumer processors even the low and mid end processors from I 3 I 5 and i7 are getting buffs where core counts ranging from quad hex and octa-core as a sign node supercomputers are the best examples of utilizing Hardware parallelism for example until zexion fie and hamdi's epic processors of core chemists ranging from 24 to 72 with supercomputers having tens of thousands of processors with imminent now there's one key component that is required in tandem with hardware parallelism to truly use all the resources efficiently software parallelism this leads us to the final topic in classical computing will cover hyper threading also referred to as multi-threading instead of being implemented as Hardware parallelism this is used as higher level software parallelism think of a thread as a sequence of instructions now it's single threading that sequence of instructions just flows through the pipeline as normal however with multi-threading you can segment your application in many threads and specifically choose how you want to execute them multi-threading can significantly increase computing performance by explicitly stating what CPU resources you want to utilize and one for example for an application the user interface GUI can be executed on one thread while the logic is executed on another this is just one example of many instances when multi-threading can be used now multi-threading can't just be used for every application since classical computing is an intrinsically parallel there can be a lot of issues with concurrency or when multiple threads are executing at the same time but depend on the result of each other thus some applications end up being only single threaded however many individuals and groups are working on ways to best utilize hardware parallelism through new software practices and rewriting old software for example the latest Firefox updates now are bringing in multi-threading also some of the most computationally intensive past by default Excel and multi-threading such as video editing rendering and data processing to list a few also as exemplified by the gaming industry a lot of games are not moving into multi-threading performance so in summary classical computing is asynchronous not truly parallel instructions are still executed in serial but through the use of harder and software level parallelism maximize a utilization of the resources of the computer making them execute extremely fast giving the illusion of parallel operation also if you want a deeper look into how the CPU works I highly recommend you check out the two videos I've listed in the description below at this point the video has come to a conclusion I'd like to thank you for taking the time to watch it if you enjoyed it please leave a thumbs up and if you want me to elaborate on any of the topics discussed or have any topic suggestions please leave them in the comments below consider subscribing to my channel for more content follow my medium publication for accompanying blogs and like my facebook page for more bite-sized chunks of content this has been encore you've been watching singularity prosperity and I'll see you again soon

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

29 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *