




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Port AMSS-NCKU code to GPU Zhoujian Cao Academy of Mathematics and System Science, CAS Cowork with Zhihui Du, Steven Brandt, Frank Loeffler and Quan Yang 2013-8-72013 International School on Numerical Relativity and Gravitational Waves, Pohang KoreaOutlineMotivations from gravitational wave detectio
2、nNew parallel mesh refinement numerical schemeGPU acceleration for NRSummaryThe most stringent test of GRthe anomalous precession of theperihelion of Mercury (1915, v )Deflection of Starlight (1919, v )gravitational redshift (1965, v )gravitational time delayeffect (1968, v )EvidenceofGravitational
3、Waves (1978, v )frame-draggingeffect (2010, v )Direct gravitational wave detection (?, v1)GR = Newtonian Gravity + PN(v) + PN(v2) + Gravitational wave astronomySearch back to extremely early universe Hear the dark universe Gravitational wave and its detectionCategory of Black HolesSuper massive blac
4、k hole: M: 105109 MsunStellar massive black hole: M: 1-10s MsunIntermediate massive black hole: M: 10s105 Msun (mainly in globular cluster)Farrell, et al, Nature 460 (2009) 73; Feng, et al, New Astronomy Reviews 55 (2011) 166Category of Black Holes BinaryIMBHALIAXuefei Gong, et al, CQG 28, 094012 (2
5、011)1:10001:1Advanced LIGOAbadie, et al, PRD 85, 102004 (2012)IMBH and GW detectionData analysis and templateRef to Sang Hoon Ohs lectureTemplate model for BBH?Yi Pans talk, 2013Template model for BBHPN templates: for early stage of inspirallingEOBNR (effective one body model together with numerical
6、 relativity): for full inspiral + merger + ring down stage; works well for mass ratio less than 1:8 and extreme mass ratio BBH, high spinning, precession!But no reliable template for mass ratio 1:10 to 1:100From a given separation of the two BHs, when mass ratio increases the number of orbit increas
7、es quickly. This requires that the numerical simulation with full GR increases much consequently. In contrast to 1:1, 1:100 needs 10 times more computation cost.PN estimationComputational cost1:1, 9 days1:100, 20 daysLSSC cluster II, 128 CPUs, for last 2 orbits computational cost 1 to 20!Challenge o
8、f large mass BBH to NRCompared to 1:1, the computational cost of 1:100 BBH increase roughly 200 times!For typical simulation of 1:1 BBH, 14 days are needed. So by straight forward method to 1:100, roughly 1year is needed!Possible ways out1. Physical level: approximation method, such as self force fr
9、ame work (but still first order yet), 2. Numerical Algorithm level: implicit scheme R. Lau et al, PRD 84, 084023 (2011), combine Cauchy evolution to null evolution, 3. Computer level: improve scalability to use more CPUs, use GPU, Possible ways out1. Physical level: approximation method, such as sel
10、f force frame work (but still first order yet), 2. Numerical Algorithm level: implicit scheme R. Lau et al, PRD 84, 084023 (2011), combine Cauchy evolution to null evolution, 3. Computer level: improve scalability to use more CPUs, use GPU, Mesh refinement schemeHigh resolution mesh grids for region
11、 near BH, while low resolution mesh grids for far regionMesh refinement in CFDResult based on PARAMESHPARAMESHGrACEJASMINComparison of NR and CFDNR (only for BH): computational expensive on single grid point, but functions quite smooth few grid points (handrads), high order finite differenceCFD: com
12、putation on single point is cheap, but fluid dynamics quite complex (compare the lectures on HD) grid number is quite large (millions)Mesh refinement schemeScheme adopted by PARAMESHLevel 0Level 1Mesh refinement schemeScheme adopted by PARAMESHLevel 0Level 1txMesh refinement schemeScheme for NRLevel
13、 0Level 1Distribute data along one level to available processesMesh refinement schemeScheme for NRF. Loeffler et al, CQG 29, 115001 (2012)Level 0Level 1LS schemeMesh refinement schemeParallelization limit:200 x200 x2006th order finite difference (8 ghost points for two sides) processesHow about dist
14、ribute data on all levels and calculate them parallely?Parallel mesh level algorithmPX scheme: distribute data on all levels to all processes; calculate parallelyMesh refinement scheme Procs for lev0 procs for lev1 procs for lev2 run run run wait wait run wait run run wait wait run run run run Stron
15、g scalling property due to more data to distribute;Resource wasting (Lx procs of LS) due to waiting!Calculation speed: 2 times faster!timeParallel mesh level algorithmP2 scheme: distribute data on finest level to half processes and distribute data on other levels along the same level to another half
16、 processes; calculate parallely for finest level and other levels, while sequentially for other levelslev0lev2lev1Mesh refinement scheme Procs for lower levels procs for lev2 lev1 run lev0 run lev1 run wait run lev1 run Scalling property is weaker than PX;Less waiting (2x procs LS)!Calculation speed
17、: 2 times faster!timeComparison to LS schememore complicate casetxlev0lev1lev2 Now, procs for finest level have to wait!more complicate casetxlev0lev1lev2GPU accelerationFor system biology, Yamazaki, Igarashi, Neural Networks, 2013For GW data analysis, Zhihui Du, et al, CQG 29, 235018 (2012)Put RHS
18、calculation to GPUFor AMSS-NCKU code, time for RHS calculation 80%RHS function involves too many variables, even only transform their addresses are time consumingSo pack these addresses and store it in constant memory (do not transform any more during evolution), save shared memory at the same timeP
19、ut RHS calculation to GPUKeep the data on GPU till MPI data transfer between different processesUsing buffer point method to reduce MPI transfer for RK4 from 4 times to only 1 time; also reduce data transfer times between GPU and CPUPut RHS calculation to GPUArrange shared memoryDivide RHS calculati
20、on into 8 parts, let the memory requirement for each part can be satisfied with shared memoryFor one RHS calculation, copy data from global memory to shared memory once and use shared memory in most timePut restrict-prolong to GPUAfter put RHS to GPU, the most time consuming part is Restrict-Prolong interpolationHow to treat this part? The work is going on
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年金属链条及零件项目发展计划
- 中小学综合实践活动课程知到课后答案智慧树章节测试答案2025年春西南大学
- 三级人力资源管理师-三级人力资源管理师考试《理论知识》考前冲刺1
- 防疫期间路线施工方案
- 基于Modbus通讯协议的多路石油物流信息监控系统的研究
- 基于Ansys+Workbench宽幅砂光机机架结构分析及优化
- 2024-2025学年高中语文课时作业2冰心:巴金这个人含解析粤教版必修2
- 2025版高考生物一轮复习第9单元生物与环境第2讲群落的结构和演替学案苏教版必修3
- 2025届高考化学一轮复习化学反应速率和化学平衡夯基题新人教版
- 施工方案的验收要点
- 湖北省武汉市2024-2025学年高三2月调研考试英语试题
- 教科版三年级下册科学全册同步练习(一课一练)
- 人教版(2024)七下 第二单元第1课《精彩瞬间》课件-七年级美术下册(人教版)
- 2024天津高考英语试题及答案
- 2024中华人民共和国农村集体经济组织法详细解读课件
- 管网工程停气恢复供气方案
- 英语教学经验交流发言稿
- 水稻种植专业合作社简介
- WINCC中文培训PPT课件
- 协助抚养意愿书
- 注射用艾司奥美拉唑钠报告及中试总结
评论
0/150
提交评论