MPI_Abort errors on different node counts for Cori KNL
Posted: Wed Jun 14, 2017 3:55 am
Hi, I am new to Qbox, and I am trying to compute MD calculations on Cori KNL at NERSC for different numbers of nodes for benchmarking. I am using a Qbox module they provided, 1.63.5. First, I do a ground state calculation that works fine most of the time but on five nodes gives an error:
and many more after that. My input file is:
and SnO.sys is
... it has 324 atoms.
It seems in the output file the <eigenvalue_sum> fluctuates a lot, grep-ing it gives
There's more before that but it was a lot, that's just the end of it, and the file ends with many lines of:
Subsequent MD calculations on all nodes fail too. The input is:
And the end of the output is:
When I do this for a small cell of 4 atoms, everything works fine. Hopefully you will know how to fix this.
Thank you in advance for any help,
-Tyler Bishop
Code: Select all
Rank 137 [Tue Jun 13 17:51:05 2017] [c11-4c0s0n2] application called MPI_Abort(MPI_COMM_WORLD, 2) - process 137
Code: Select all
set nrowmax 64
SnO.sys
set ecut 20
set xc PBE
set wf_dyn PSDA
randomize_wf
set scf_tol 1.e-8
run -atomic_density 0 20 20
save gs.xml
Code: Select all
set cell 68.39864845 0 0 0 62.19542973 0 0 0 30
species tin Sn_ONCV_PBE-1.1.xml
species oxygen O_ONCV_PBE-1.0.xml
atom Sn_000 tin 0.0000000000 3.4553016519 7.2387095421
atom Sn_001 tin 3.7999249137 0.0000000000 11.6731783381
atom O_000 oxygen 0.0000000000 0.0000000000 9.6417027165
atom O_001 oxygen 3.7999249137 3.4553016519 9.2701976498
It seems in the output file the <eigenvalue_sum> fluctuates a lot, grep-ing it gives
Code: Select all
<eigenvalue_sum> 137553.601065 </eigenvalue_sum>
<eigenvalue_sum> -780899.179023 </eigenvalue_sum>
<eigenvalue_sum> -28447.778683 </eigenvalue_sum>
<eigenvalue_sum> -39040.139854 </eigenvalue_sum>
<eigenvalue_sum> 134230.029446 </eigenvalue_sum>
<eigenvalue_sum> -4863865.300754 </eigenvalue_sum>
<eigenvalue_sum> -2070989.432315 </eigenvalue_sum>
Code: Select all
DoubleMatrix::potrf, info=334
Code: Select all
set nrowmax 64
load gs.xml
set xc PBE
set wf_dyn PSDA
set scf_tol 1.e-6
set atoms_dyn MD
set dt 60
randomize_v 600
run 10 10
save md4.xml
Code: Select all
<unit_cell_volume> 127622.500 </unit_cell_volume>
<econst> inf </econst>
<ekin_ion> 0.89641798 </ekin_ion>
<temp_ion> 582.46993560 </temp_ion>
total_electronic_charge: 3240.00000000
<eigenvalue_sum> -nan </eigenvalue_sum>
DoubleMatrix::potrf, info=1
...
Thank you in advance for any help,
-Tyler Bishop