(Times and speeds quoted are typical, but do not refer to any specific hardware, merely give an illustration of the principles involved.)
Now we introduce a 'high speed' memory with a cycle time of, say 250 nanoseconds between the CPU and the core memory. When we request the first instruction, at location 100, the cache memory requests addresses 100,101,102 and 103 from the core memory all at the same time, and retains them 'in cache'. Instruction 100 is passed to the CPU for processing, and the next request, for 101, is filled from the cache. Similarly 102 and 103 are handled at the much increased repeat speed of 250ns. In the meantime the cache memory has requested the next 4 addresses, 104 to 107. This continues until the predicted 'next location' is incorrect. The process is then repeated to reload the cache with data for the new address range. A correctly predicted address, when the requested location is in cache is known as a cache 'hit'.
If the main memory is not core, but a slower chip memory, the gains are not as great, but still an improvement. Expensive high speed memory is only required for a fraction of the capacity of the cheaper main memory. Also programmers can design programs to suit the cache operation, for instance by making a branch instruction in a loop take the next instruction for all cases except the final test, maybe count=0, when the branch occurs.
Now consider the speed gains to be made with disks. Being a mechanical device, a disk works in milliseconds, so loading a program or data from disk is extremely slow in comparison, even to core memory - 1000 times faster! Also there is a seek time and latency to be considered. (This is covered in another article on disks.)
You may have heard the term DMA in relation to PCs. This refers to Direct Memory Access. Which means that data can be transferred to or from the disk directly to memory, without passing through any other component. In a mainframe computer, typically the I/O or Input/Output processor has direct access to memory, using data placed there by the Processor. This path is also boosted by using cache memory.
In the PC, the CPU chip now has built-in cache. Level 1, or L1, cache is the primary cache in the CPU which is SRAM or Static RAM. This is high speed (and more expensive) memory compared to DRAM or Dynamic RAM, which is used for system memory. L2 cache, also SRAM, may be incorporated in the CPU or externally on the Motherboard. It has a larger capacity than L1 cache.