INTRODUCTION David R. Kaeli, Northeastern University and Pen C. Yew, University of MinnesotaINSTRUCTION CACHE PREFETCHING Glenn Reinman, UCLA Computer Science DepartmentDirect Mapped Cache Set Associative Cache Pseudo Associative Cache Way Prediction Cache Next Line Prefetching Target Prefetching Stream Buffers Nonblocking Instruction Caches and Out-Of-Order FetchFetch Directed Instruction Prefetching Integrated Prefetching Wrong-Path Prefetching Compiler Strategies BRANCH PREDICTIONPhilip G. Emma, IBM T.J. Watson Research LaboratoryThe von Neumann Programming Model vs. ENIAC Dataflow and Control Flow The Branch Instruction The IAS Machine: A Primitive Stored-Program ArchitectureVirtualityBranch Instruction Semantics General Instruction-Set Architectures and Extensions Memory Consistency and Observable OrderBranches and Performance Pipelining Pipeline Disruptions and Their Penalties Superscalar Processing MultithreadingInstruction Prefetching and Autonomy The Delayed Branch Instruction Branch Flow in a Pipeline: The "When" of Branch PredictionPredicting Branches at Decode TimePredicting Branches at Instruction-Prefetch Time Static Branch PredictionDynamic Branch Prediction Branch Prediction With Counters Predicting by Profiling Branch Actions Group Behaviors vs. Predicting Individual Branches The Decode History Table (a.k.a. Branch History Table)Discriminators Using Multiple Discriminators: A Path-Based ApproachImplementation A Timing Caveat Hybrid Predictors Instruction Prefetching The Branch History Table (a.k.a. Branch Target Buffer)Operation of the BTB Fetch Width and Branch Mispredictions The Subroutine Call and Return Structure Predicting Return Addresses by Using a Stack Recognizing Subroutine Calls and Returns Taking Advantage of the BTB Structure Eliminating the StackWorking Sets and Contexts The Size of a BTB Entry The BTB and the Instruction Cache: Economies of Size More Exotic Prediction for the More Difficult Branches Branches and the Operand Space Branches and the Operand-Address Space Tandem Branch Prediction Accuracy and the Updating of Tables Predictor Bandwidth and Anomalous Behaviors The Importance of Fast Prediction Mechanisms Superscalar Processing and the Monolithic Predictionof Branch Sequences Predicting Branches in a Multithreaded Environment Limitations Simplicity Complexity Two Saving Graces Implementing Real Branch Prediction Mechanisms TRACE CACHES Eric Rotenberg North Carolina State UniversityTraces Core Fetch Unit Based on Instruction Cache Trace Cache Operation Path Associativity Indexing Strategy Partial MatchingCoupling Branch Prediction with the Trace Cache Trace Selection Policy Multi-Phase Trace Construction Managing Overlap between Instruction Cache and TraceCache Speculative vs. Non-Speculative Trace Cache Updates Powerful vs. Weak Core Fetch Unit Parallel vs. Serial Instruction Cache Access L1 vs. L2 Instruction Cache Loop Caches BRANCH PREDICATION David August, Princeton UniversityOvercoming Branch Problems with Predication If-Conversion Predicate Optimization and Analysis The Predicated Intermediate Representation Hewlett-Packard Laboratories PD Cydrome Cydra 5 ARM Texas Instruments C6X Systems with Limited Predicated Execution SupportPredication in the Itanium 2 Processor MULTIPATH EXECUTION Augustus K. Uht, University of Rhode IslandBranch Tree Geometry Branch Path/Instruction ID Phases of Operation GranularityWith Predication With Data Speculation Compiler-Assisted Hardware: Classically-Based Hardware: Non Classically-Based Multiprocessors Functional or Logic Language Machines Branch Prediction Confidence Estimation Pipeline Depth Implications of Amdahl's Law - ILP Version Memory Bandwidth Requirements DATA CACHE PREFETCHING Yan Solihin, North Carolina State University, and Donald Yeung, University of Maryland at College ParkArchitectural Support Array Prefetching Pointer Prefetching Relationship with Data Locality Optimizations Stride and Sequential Prefetching Correlation Prefetching Content-Based Prefetching ADDRESS PREDICTION Avi Mendelson, Intel Mobil Micro-Processor ArchitectTerminology and Definitions Non-Speculative Address Calculation Techniques Speculative Address Calculation Techniques Chapter Focus Characterization of Address Predictability Address Predictability vs. Value Predictability Combining Address Prediction with Prefetching MechanismBasic Characterization Load Promotion Memory Bypassing Compiler Based Speculative Load Promotion DATA SPECULATION Yiannakis Sazeides, University of Cyprus; Pedro Marcuello, Intel-UPC Barcelona Research Center; James E. Smith, Universityof Wisconsin-Madison; and Antonio González, Universitat Polit`ecnica de CatalunyaBasic Value Predictors Value Predictor Alternatives Confidence Estimation Implementation Issues Data Dependence PredictorsVerification Recovery Other Microarchitectural Implications of Data ValueSpeculation Related Work: Data Value Speculation Related Work: Data Dependence Speculation INSTRUCTION PRECOMPUTATION: DYNAMICALLY REMOVING REDUNDANT COMPUTATIONS USING PROFILING Joshua J. Yi, Freescale Semiconductor Inc.; Resit Sendag, University of Rhode Island; and David J. Lilja, University of Minnesota at Twin CitiesA Comparison of Instruction Precomputation and Value Reuse Upper-Bound - Profile A, Run A Different Input Sets - Profile B, Run A Combination of Input Sets - Profile AB, Run A Frequency versus Frequency and Latency Product Performance of Instruction Precomputation versus Value Reuse PROFILE-BASED SPECULATION Youfeng Wu and Jesse Fang, Intel Microprocessor Technology LabsControl Flow Profile Memory Profile Value Profile Static Analysis Instrumentation Hardware Performance Monitoring Special Hardware Software-Hardware Collaborative Profiling Compile-time Profiling Runtime ProfilingContinuous Profiling Trace SchedulingHot-Cold Optimizations Code Layout Data Layout Stride Prefetching Hot Data Stream Prefetching Mississippi Delta Prefetching Java Runtime Parallelizing Machine Speculative Parallel Threading Speculative Computation Reuse Software-Based Speculative Precomputation Stability across Multiple Workloads Update When Program Changes Maintenance during optimizations Perturbation by Profiling Code COMPILATION AND SPECULATION Jin Lin, Wei-Chung Hsu and Pen-Chung Yew University of Minnesota, MinneapolisAlias Profiling Data Dependence Profiling Speculative Alias and Dataflow AnalysesA Framework for Speculative Alias Analysis and Dataflow Analysis Overview F-Insertion Step Rename Step Downsafety Step CodeMotion StepRecovery Code Generation for General Speculative OptimizationsCheck Instructions and Recovery Code Representation for Multi-Level SpeculationInteraction of the Early Introduced Recovery Code with Later OptimizationsMULTITHREADING AND SPECULATION Pedro Marcuello, Jesus Sanchez and Antonio Gonzalez Intel-UPC Barcelona Research Center; Intel Labs; Universitat Politecnica de Catalunya; Barcelona (Spain)Building Helper Threads Microarchitectural Support for Helper Threads Thread Spawning Schemes Microarchitectural Support for Speculative Architectural Threads References Andreas Moshovos, University of TorontoEXPLOITING LOAD/STORE PARALLELISM VIA MEMORY DEPENDENCE PREDICTION Static Methods Hybrid Static/Dynamic Methods Dynamic Methods Working Example Multiple Dependences Per Static Load or Store MethodologyPerformance Potential of Load/Store Parallelism Performance with Naive Memory Dependence SpeculationUsing Address-Based Scheduling to Extract Load/Store Parallelism Speculation/SynchronizationRESOURCE FLOW MICROARCHITECTURES David A. Morano and David R. Kaeli, Northeastern UniversityThe Operand as a First Class Entity Dynamic Dependency Ordering Handling Multipath Execution Names and Renaming The Active Station Idea Register and Memory Operand Storage Operand Forwarding and Snooping Result Forwarding Buses and Operand Filtering A Small Resource-Flow Microarchitecture A Distributed Scalable Resource-Flow Microarchitecture