Xeon Phi is a new special cluster based on new processors Intel Xeon Phi 7210 in the experimental CERIT-SC environment.
Xeon Phi is massively-parallel architecture consisting of high number of x86 cores (Many Integrated Core architecture). Unlike old generation, the new Xeon Phi (based on Knight Landing architecture) is a self-booting system (there is no conventional CPU needed), which is fully compatible with x86 architecture. Thus, you can submit jobs to Xeon Phi nodes in the same way as to CPU-based nodes, using the same applications. No recompilation or algorithm redesign is needed, although may be beneficial.
- cluster phi[1-6].cerit-sc.cz, 6 nodes (384 CPUs), each:
- CPU: 64-core Intel Xeon Phi 7210, 1.30GHz (256 HT cores)
- RAM: 192GB phi1-phi4, 384GB phi5-phi6 + 16 GB high bandwith memory (HBM)
- disk: 1x 800 GB SSD, (scratch_ssd), 2x 3 TB (scratch_local)
- property: CERIT-SC
- SPECfp2006 performance of each node: 748 (11.7 per core)
Using the Xeon Phi in CERIT-SC experimental PBS Pro environment @wagap-pro
Frontend zuphux.cerit-sc.cz, after login on the frontend switch Torque to PBSPro by the command:
$module add pbspro-client ... set PBSPro environment $module rm pbspro-client ... return Torque environment
- Queue: firstname.lastname@example.org
- Home (NFS): storage-brno3-cerit.metacentrum.cz; please note, all other disk arrays are not connected via NFS, data from them should be coppied to scratch using scp
- qsub syntax (Request for job with 12 processors on each of 3 chunks (nodes), 1 GB of RAM, 1 GB of local scratch, 1 hour walltime):
qsub -q email@example.com -l select=3:ncpus=12:mem=1gb:scratch_local=1gb -l walltime=1:00:00 skript.sh
How to use Xeon Phi effectively
|Xeon Phi advanced info||
Despite compatibility with x86 CPU, not all jobs are advisable for Xeon Phi.
- Xeon Phi 7210 has 256 virtual cores (64 physical) running at 1.3GHz with overall performance of 2.66 TFlops in double precision and 5.32 TFlops in single precision.
- Its performance is significantly higher than performance of standard Xeon CPUs if all cores are utilized!
- Poorly-scaling or not parallel workloads are very slow on Xeon Phi!
- Xeon Phi is also good candidate for acceleration of memory-bandwidth intensive workloads: it is equipped with 16GB of high-bandwidth memory (about 400GB/s) and up to 384GB of conventional DDR4 memory (about 100GB/s). By default, DDR4 memory is used. The execution of whole program _your-binary_ in high-bandwidth memory can be done by: numactl -m l _your-binary_
- Xeon Phi 7210 supports AVX-512 vector instructions. If your application use automatic vectorization, it can be re-compiled with Intel C (icc/icpc in module intelcdk-17) using flag -xMIC-AVX512. Beware that without using AVX-512, your software is able to reach at most half of theoretical Xeon Phi performance.