## Post-K Computer Project Overview contact: postk-info@riken.jp RIKEN was selected to carry out development of the post-K computer—designed to be the successor of the K computer—under the Ministry of Education, Culture, Sports, Science, and Technology's FLAGSHIP 2020 Project. We are currently engaged in the development work with the aim to launch the new computer around 2020. ## Outline of the development of the post-K computer ## Top priority on problem-solving research During development, highest priority will be given to creating a system capable of contributing to the solution of various scientific and societal issues. For this, the hardware and software will be developed in a coordinated way (Co-design), with the aim to make it usable in a variety of fields. ## World-leading performance Create the most advanced general-use system in the world. ## Improve performance through international cooperation While leveraging Japan's strengths, cooperate internationally to achieve world-leading technologies of the highest quality and become the international standard. ## Continue the legacy of the K computer Make the fullest use of the technologies, human resources, and applications of the K computer project for developing the post-K system. ## Research Subjects of the post-K computer ## Priority Issues Japanese research institutes and universities in charge of the priority issues started their work in 2015 and are actively involved in creating a bright new future. ## Exploratory challenges In addition to the priority issues, four exploratory challenges to be tackled with post-K computer have been selected. Their actualization will be examined through feasibility study. - Frontiers of Basic Science: Challenging the Limits - Construction of Models for Interaction Among Multiple Socioeconomic Phenomena - Elucidation of the Birth of Exoplanets [Second Earth] and the Environmental Variations of Planets in the Solar System - Elucidation of How Neural Networks Realize Thinking and Its Application to Artificial Intelligence ## Overview of Post-K Architecture contact: postk-info@riken.jp ## System Architecture and System Software ### CPU node - Many-core processor with Interconnect interface integrated on chip - A new Instruction Set Architecture: ARMv8 with SVE (Scalable Vector Extension) - Power Knob features for saving power consumption #### Interconnect - A new version of TOFU (6D mesh/torus network) - 3-level hierarchical storage system - Silicon Disk - Magnetic Disk - Storage for archive Overview of System software **Architecture Overview** ## System Software - Multi-Kernel: Linux with Light-weight Kernel - File I/O middleware for 3-level hierarchical storage system and application - Application-oriented file I/O middleware - MPI+OpenMP programming environment - Highly productive programing language and libraries ## ARM Scalable Vector Extension (SVE) SVE is an extension developed specifically for vectorization of HPC scientific workloads. (FP64/FP32/FP16) #### Features of SVE - Scalable vector length (VL): Increased parallelism while allowing implementation choice of VL - VL agnostic (VLA) programming: Supports a programming paradigm of write-once, run-anywhere scalable vector code - Gather-load & Scatter-store: Enables vectorization of complex data structures with non-linear access patterns - Per-lane predication: Enables vectorization of complex, nested control code containing side effects and avoidance of loop heads and tails (particularly for VLA) - Predicate-driven loop control and management: Reduces vectorization overhead relative to scalar code - Scalarized intra-vector sub-loops: Supports vectorization of loops containing complex loop-carried dependencies (Info from ARM: https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector -extension-sve-for-the-armv8-a-architecture) ## The ARMv8-A SVE Supplement is now publically available. (ARM(r) Architecture Reference Manual Supplement - The Scalable Vector Extension (SVE), for ARMv8-A: https://developer.arm.com/products/architecture/a-profile/docs) ### Research for ARM SVE - Early assessment for ARM SVE spec. - GEM5 processor simulator for ARM SVE - Development of GEM5 O3 Model for Post-K processor - Evaluation and Testing of compilers for ARM SVE (with Kyoto Univ.) - ARM compiler (based on LLVM) (C and C++) and Fujitsu compiler (Fortran and C, C++) - Compiler Research on SIMD-vector code-generation for ARM SVE #### Post-K: Fujitsu HPC CPU to Support ARM v8 ARM' FUÏITSU Post-K fully utilizes Fujitsu's proven supercomputer microarchitecture Fujitsu, as a "lead partner" of ARM HPC extension development, is working to realize an ARM Powered® supercomputer w/ high application performance ARM v8 brings out the real strength of Fujitsu's microarchitecture HPC apps acceleration feature K computer **FX100** Post-K **FX10** FMA: Floating Multiply and Add Math. acceleration primitives\* ✓ Enhanced ✓ Enhanced Inter core barrier **✓** Enhanced Sector cache ✓ Enhanced **✓** Enhanced Hardware prefetch assist ✓ Enhanced ✓Integrated Tofu interconnect ✓ Integrated \* Mathematical acceleration primitives include trigonometric functions, sine & cosines, and exponential function Announcement from Fujitsu about the adoption of ARM v8 as post-K's ISA (at ISC 2016) https://www.fujitsu.com/global/Images/moving-forward-the-next-step-in-fujitsu-supercomputing.pdf ## System Software Development Team Architecture Development Team ## System Software Development Team Team Leader: Yutaka ISHIKAWA contact: http://www.sys.aics.riken.jp/ The system software development team designs and develops system software for the post K supercomputer, focusing broadly on three topics. A lightweight multi-kernel based operating system (called IH-K/McKernel) that combines Linux with a lightweight kernel (LWK) to achieve the followings: - •Provide LWK scalability and full Linux API compatibility at the same time. Linux compatibility is retained by selectively offloading OS services from the LWK to Linux - •Provide kernel level specialization both for hardware and application specific needs #### **Communication:** - •RIKEN MPI (MPICH-based MPI library) - Support latest and even in-draft MPI standards - Provide scalable performance - Broader communication library - Connect compute-nodes to off-site machines #### Only performance System S jitter contained in Linux, LWK is isolated Proxy process sensitive daemon system calls are implemented in **Application** McKernel, rest are delegate to Linux Delegator Linux McKernel module Kernel System System **IHK-Master IHK-Slave** call daemon call CPU Memory Interrupt Partition Partition Interface for Heterogeneous Kernels (IHK) and McKernel provide a multi-kernel operating system (OS) that is tailored to high-end HPC but retains full Linux compatibility #### File I/O and Hierarchical Storage: - Provide abstractions to efficiently deal with multi-level storage hierarchy - Absorb bursty I/O traffic and provide scalable caching ## Architecture Development Team Team Leader: Mitsuhisa SATO contact: msato@riken.jp The architecture development team designs the architecture of the post K supercomputer and the programming environment in cooperation with our partner company, Fujitsu. ### **Co-design Tools** - •CPU simulator for Post-K - based on gem5, capable of Out-of-Order execution - MPI application replay tool - investigates parallel applications on a single node - SCAMP (SCAlable Mpi Profiler) - simulates a large scale network from a small # of profiling results #### **XcalableMP** - PGAS programming language for cluster computing - OpenMP-like directives for data-parallelism - Coarray syntax for one-sided communication - Omni compiler: http://www.xcalablemp.org/ - Extension for multitasking using lightweight threads ### **Compiler Development based on LLVM** - OpenMP extensions - task allocation for NUMA optimization - user interface for target specific SIMD programming - SIMD vectorization for ARM SVE # Application Development Team Co-Design Team ## Application Development Team Team Leader: Hirofumi TOMITA contact: htomita@riken.jp ## Co-design Based on Target Applications The Application Development Team co-designs the applications and the Post-K system through optimization and sophistication of the Target Applications. The team works in cooperation with the developers of the Target Applications in Institutes and Universities throughout Japan. The "Target Applications" are selected from the nine social and science priority issues of Post-K project to represent the features of wide variety of applications. ## **Development of Mini-Applications** The team develops "Mini-Applications", which is simplified by confining the calculation condition, but inherits the main feature of the full application. This can be employed as a benchmark for critical evaluation of future HPC systems including Post-K. ## Development of Application Infrastructures The team develops general numerical libraries and domain-specific frameworks to maximize the performance of applications on the Post-K system. | | | Application | Feature | |---|------|-------------|--------------------------------------------------------------------------------------------| | | 1 | GENESIS | Classical MD of biomolecules (Particle simulation) | | | Ш | Genomon | Genome processing (Genome alignment) | | | Ш | GAMERA | Earthquake simulator (FEM in structured & unstructured grid) | | | IV | NICAM+LETKF | Weather prediction system with Big Data (Structured grid stencil & ensemble Kalman filter) | | | V | NTChem | Molecular electronic structure calculation (Post HF) | | | VI | ADVENTURE | General-purpose computational mechanics system (3D FEM in unstructured grid) | | , | VII | RSDFT | Electronic structure calculation with Density Functional Theory | | \ | VIII | FFB | Large eddy simulation (Unstructured grid) | | | IX | LQCD | Lattice QCD simulation (Structured grid Monte Carlo) | Simple and compact 1K ~ 10K lines Communication, file I/O Open Source Large and complex 10K+ ~ 100K+ lines Various forms of parallel execution, communication, I/O Source not always available https://fiber-miniapp.github.io/ ## Project for Future HPC Application The team leads the promotional activities of future HPC applications, by investigating the social and scientific challenges to be solved in the next 5-10 years, for future HPC projects. ## Co-Design Team Team Leader: Junichiro MAKINO contact: jmakino@riken.jp ## What's Co-Design? Astronomy, Fluid dynamics, Material science, Chemistry... Highly optimized applications on new supercomputer with high efficiency/performance Co-Design team develops framework for HPC users to implement advanced algorithms ## Formura - Domain specific language for optimized stencil computations - •Simple math notation for higher order integration scheme - Auto tuning of generated C code with MPI - •Underground biology simulation runs with 1.184 Pflops on K computer (Muranushi et al., SC16, 2016, Salt Lake City, USA.) - Available at https://github.com/nushio3/formura **FDPS** - •Framework for Developing Particle Simulator (Iwasawa et al., PASJ, 68, 54, 2016.) - User only programs particle integration and force kernel w/o parallelization - Optimized parallel MPI/OpenMP code generation - •GPU support / Fortran interface - Available at https://github.com/FDPS/FDPS Simulation of large-scale cosmic structure formation using FDPS Simple 27-column Formura code (left) is compiled to optimized code of large-scale simulation (right)