大质量比双黑洞给数值相对论带来的挑战课件_第1页
大质量比双黑洞给数值相对论带来的挑战课件_第2页
大质量比双黑洞给数值相对论带来的挑战课件_第3页
大质量比双黑洞给数值相对论带来的挑战课件_第4页
大质量比双黑洞给数值相对论带来的挑战课件_第5页
已阅读5页,还剩37页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Port AMSS-NCKU code to GPU Zhoujian Cao Academy of Mathematics and System Science, CAS Cowork with Zhihui Du, Steven Brandt, Frank Loeffler and Quan Yang 2013-8-72013 International School on Numerical Relativity and Gravitational Waves, Pohang KoreaOutlineMotivations from gravitational wave detectio

2、nNew parallel mesh refinement numerical schemeGPU acceleration for NRSummaryThe most stringent test of GRthe anomalous precession of theperihelion of Mercury (1915, v )Deflection of Starlight (1919, v )gravitational redshift (1965, v )gravitational time delayeffect (1968, v )EvidenceofGravitational

3、Waves (1978, v )frame-draggingeffect (2010, v )Direct gravitational wave detection (?, v1)GR = Newtonian Gravity + PN(v) + PN(v2) + Gravitational wave astronomySearch back to extremely early universe Hear the dark universe Gravitational wave and its detectionCategory of Black HolesSuper massive blac

4、k hole: M: 105109 MsunStellar massive black hole: M: 1-10s MsunIntermediate massive black hole: M: 10s105 Msun (mainly in globular cluster)Farrell, et al, Nature 460 (2009) 73; Feng, et al, New Astronomy Reviews 55 (2011) 166Category of Black Holes BinaryIMBHALIAXuefei Gong, et al, CQG 28, 094012 (2

5、011)1:10001:1Advanced LIGOAbadie, et al, PRD 85, 102004 (2012)IMBH and GW detectionData analysis and templateRef to Sang Hoon Ohs lectureTemplate model for BBH?Yi Pans talk, 2013Template model for BBHPN templates: for early stage of inspirallingEOBNR (effective one body model together with numerical

6、 relativity): for full inspiral + merger + ring down stage; works well for mass ratio less than 1:8 and extreme mass ratio BBH, high spinning, precession!But no reliable template for mass ratio 1:10 to 1:100From a given separation of the two BHs, when mass ratio increases the number of orbit increas

7、es quickly. This requires that the numerical simulation with full GR increases much consequently. In contrast to 1:1, 1:100 needs 10 times more computation cost.PN estimationComputational cost1:1, 9 days1:100, 20 daysLSSC cluster II, 128 CPUs, for last 2 orbits computational cost 1 to 20!Challenge o

8、f large mass BBH to NRCompared to 1:1, the computational cost of 1:100 BBH increase roughly 200 times!For typical simulation of 1:1 BBH, 14 days are needed. So by straight forward method to 1:100, roughly 1year is needed!Possible ways out1. Physical level: approximation method, such as self force fr

9、ame work (but still first order yet), 2. Numerical Algorithm level: implicit scheme R. Lau et al, PRD 84, 084023 (2011), combine Cauchy evolution to null evolution, 3. Computer level: improve scalability to use more CPUs, use GPU, Possible ways out1. Physical level: approximation method, such as sel

10、f force frame work (but still first order yet), 2. Numerical Algorithm level: implicit scheme R. Lau et al, PRD 84, 084023 (2011), combine Cauchy evolution to null evolution, 3. Computer level: improve scalability to use more CPUs, use GPU, Mesh refinement schemeHigh resolution mesh grids for region

11、 near BH, while low resolution mesh grids for far regionMesh refinement in CFDResult based on PARAMESHPARAMESHGrACEJASMINComparison of NR and CFDNR (only for BH): computational expensive on single grid point, but functions quite smooth few grid points (handrads), high order finite differenceCFD: com

12、putation on single point is cheap, but fluid dynamics quite complex (compare the lectures on HD) grid number is quite large (millions)Mesh refinement schemeScheme adopted by PARAMESHLevel 0Level 1Mesh refinement schemeScheme adopted by PARAMESHLevel 0Level 1txMesh refinement schemeScheme for NRLevel

13、 0Level 1Distribute data along one level to available processesMesh refinement schemeScheme for NRF. Loeffler et al, CQG 29, 115001 (2012)Level 0Level 1LS schemeMesh refinement schemeParallelization limit:200 x200 x2006th order finite difference (8 ghost points for two sides) processesHow about dist

14、ribute data on all levels and calculate them parallely?Parallel mesh level algorithmPX scheme: distribute data on all levels to all processes; calculate parallelyMesh refinement scheme Procs for lev0 procs for lev1 procs for lev2 run run run wait wait run wait run run wait wait run run run run Stron

15、g scalling property due to more data to distribute;Resource wasting (Lx procs of LS) due to waiting!Calculation speed: 2 times faster!timeParallel mesh level algorithmP2 scheme: distribute data on finest level to half processes and distribute data on other levels along the same level to another half

16、 processes; calculate parallely for finest level and other levels, while sequentially for other levelslev0lev2lev1Mesh refinement scheme Procs for lower levels procs for lev2 lev1 run lev0 run lev1 run wait run lev1 run Scalling property is weaker than PX;Less waiting (2x procs LS)!Calculation speed

17、: 2 times faster!timeComparison to LS schememore complicate casetxlev0lev1lev2 Now, procs for finest level have to wait!more complicate casetxlev0lev1lev2GPU accelerationFor system biology, Yamazaki, Igarashi, Neural Networks, 2013For GW data analysis, Zhihui Du, et al, CQG 29, 235018 (2012)Put RHS

18、calculation to GPUFor AMSS-NCKU code, time for RHS calculation 80%RHS function involves too many variables, even only transform their addresses are time consumingSo pack these addresses and store it in constant memory (do not transform any more during evolution), save shared memory at the same timeP

19、ut RHS calculation to GPUKeep the data on GPU till MPI data transfer between different processesUsing buffer point method to reduce MPI transfer for RK4 from 4 times to only 1 time; also reduce data transfer times between GPU and CPUPut RHS calculation to GPUArrange shared memoryDivide RHS calculati

20、on into 8 parts, let the memory requirement for each part can be satisfied with shared memoryFor one RHS calculation, copy data from global memory to shared memory once and use shared memory in most timePut restrict-prolong to GPUAfter put RHS to GPU, the most time consuming part is Restrict-Prolong interpolationHow to treat this part? The work is going on

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论