




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Port AMSS-NCKU code to GPU Zhoujian Cao Academy of Mathematics and System Science, CAS Cowork with Zhihui Du, Steven Brandt, Frank Loeffler and Quan Yang 2013-8-72013 International School on Numerical Relativity and Gravitational Waves, Pohang KoreaOutlineMotivations from gravitational wave detectio
2、nNew parallel mesh refinement numerical schemeGPU acceleration for NRSummaryThe most stringent test of GRthe anomalous precession of theperihelion of Mercury (1915, v )Deflection of Starlight (1919, v )gravitational redshift (1965, v )gravitational time delayeffect (1968, v )EvidenceofGravitational
3、Waves (1978, v )frame-draggingeffect (2010, v )Direct gravitational wave detection (?, v1)GR = Newtonian Gravity + PN(v) + PN(v2) + Gravitational wave astronomySearch back to extremely early universe Hear the dark universe Gravitational wave and its detectionCategory of Black HolesSuper massive blac
4、k hole: M: 105109 MsunStellar massive black hole: M: 1-10s MsunIntermediate massive black hole: M: 10s105 Msun (mainly in globular cluster)Farrell, et al, Nature 460 (2009) 73; Feng, et al, New Astronomy Reviews 55 (2011) 166Category of Black Holes BinaryIMBHALIAXuefei Gong, et al, CQG 28, 094012 (2
5、011)1:10001:1Advanced LIGOAbadie, et al, PRD 85, 102004 (2012)IMBH and GW detectionData analysis and templateRef to Sang Hoon Ohs lectureTemplate model for BBH?Yi Pans talk, 2013Template model for BBHPN templates: for early stage of inspirallingEOBNR (effective one body model together with numerical
6、 relativity): for full inspiral + merger + ring down stage; works well for mass ratio less than 1:8 and extreme mass ratio BBH, high spinning, precession!But no reliable template for mass ratio 1:10 to 1:100From a given separation of the two BHs, when mass ratio increases the number of orbit increas
7、es quickly. This requires that the numerical simulation with full GR increases much consequently. In contrast to 1:1, 1:100 needs 10 times more computation cost.PN estimationComputational cost1:1, 9 days1:100, 20 daysLSSC cluster II, 128 CPUs, for last 2 orbits computational cost 1 to 20!Challenge o
8、f large mass BBH to NRCompared to 1:1, the computational cost of 1:100 BBH increase roughly 200 times!For typical simulation of 1:1 BBH, 14 days are needed. So by straight forward method to 1:100, roughly 1year is needed!Possible ways out1. Physical level: approximation method, such as self force fr
9、ame work (but still first order yet), 2. Numerical Algorithm level: implicit scheme R. Lau et al, PRD 84, 084023 (2011), combine Cauchy evolution to null evolution, 3. Computer level: improve scalability to use more CPUs, use GPU, Possible ways out1. Physical level: approximation method, such as sel
10、f force frame work (but still first order yet), 2. Numerical Algorithm level: implicit scheme R. Lau et al, PRD 84, 084023 (2011), combine Cauchy evolution to null evolution, 3. Computer level: improve scalability to use more CPUs, use GPU, Mesh refinement schemeHigh resolution mesh grids for region
11、 near BH, while low resolution mesh grids for far regionMesh refinement in CFDResult based on PARAMESHPARAMESHGrACEJASMINComparison of NR and CFDNR (only for BH): computational expensive on single grid point, but functions quite smooth few grid points (handrads), high order finite differenceCFD: com
12、putation on single point is cheap, but fluid dynamics quite complex (compare the lectures on HD) grid number is quite large (millions)Mesh refinement schemeScheme adopted by PARAMESHLevel 0Level 1Mesh refinement schemeScheme adopted by PARAMESHLevel 0Level 1txMesh refinement schemeScheme for NRLevel
13、 0Level 1Distribute data along one level to available processesMesh refinement schemeScheme for NRF. Loeffler et al, CQG 29, 115001 (2012)Level 0Level 1LS schemeMesh refinement schemeParallelization limit:200 x200 x2006th order finite difference (8 ghost points for two sides) processesHow about dist
14、ribute data on all levels and calculate them parallely?Parallel mesh level algorithmPX scheme: distribute data on all levels to all processes; calculate parallelyMesh refinement scheme Procs for lev0 procs for lev1 procs for lev2 run run run wait wait run wait run run wait wait run run run run Stron
15、g scalling property due to more data to distribute;Resource wasting (Lx procs of LS) due to waiting!Calculation speed: 2 times faster!timeParallel mesh level algorithmP2 scheme: distribute data on finest level to half processes and distribute data on other levels along the same level to another half
16、 processes; calculate parallely for finest level and other levels, while sequentially for other levelslev0lev2lev1Mesh refinement scheme Procs for lower levels procs for lev2 lev1 run lev0 run lev1 run wait run lev1 run Scalling property is weaker than PX;Less waiting (2x procs LS)!Calculation speed
17、: 2 times faster!timeComparison to LS schememore complicate casetxlev0lev1lev2 Now, procs for finest level have to wait!more complicate casetxlev0lev1lev2GPU accelerationFor system biology, Yamazaki, Igarashi, Neural Networks, 2013For GW data analysis, Zhihui Du, et al, CQG 29, 235018 (2012)Put RHS
18、calculation to GPUFor AMSS-NCKU code, time for RHS calculation 80%RHS function involves too many variables, even only transform their addresses are time consumingSo pack these addresses and store it in constant memory (do not transform any more during evolution), save shared memory at the same timeP
19、ut RHS calculation to GPUKeep the data on GPU till MPI data transfer between different processesUsing buffer point method to reduce MPI transfer for RK4 from 4 times to only 1 time; also reduce data transfer times between GPU and CPUPut RHS calculation to GPUArrange shared memoryDivide RHS calculati
20、on into 8 parts, let the memory requirement for each part can be satisfied with shared memoryFor one RHS calculation, copy data from global memory to shared memory once and use shared memory in most timePut restrict-prolong to GPUAfter put RHS to GPU, the most time consuming part is Restrict-Prolong interpolationHow to treat this part? The work is going on
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 小学生礼让课件
- 2025年苏州市初中地理中考地理及答案
- 室颤教学查房课件
- 场地使用权与客户满意认证合同
- 车间承包与环保设施建设协议
- 施工现场安全责任连带保证合同
- 电子产品典当销售合同
- 车辆借用保险责任免除与损害赔偿合同
- 字画典当贷款协议书
- 彩钢瓦屋面施工及屋顶绿化一体化合同样本
- 派出所应对校园突发事件应急预案
- 燃气管道防火防爆安全方案
- 网络安全漏洞挖掘与报告
- 埋地消防管渗漏整改工程施工方案
- 装饰装修施工人员安全知识培训考试试卷及答案
- 2023年上海市普通高中学业水平合格性考试地理试题及答案
- 宿舍消防安全培训课件
- 2024版小学一年级下册综合实践活动模拟试卷
- 江苏2024年江苏省美术馆招聘笔试历年典型考题及考点附答案解析
- 2023-2024学年浙江省杭州市小升初考试数学试卷含解析
- DZ∕T 0215-2020 矿产地质勘查规范 煤(正式版)
评论
0/150
提交评论