Page 1 of 2
Qbox hangs when running more than 8 mpi processes.
Posted: Thu Jan 31, 2013 7:06 pm
by jlow
Qbox Wizards,
I have built Qbox-1.56.2 on a Sandy-Bridge Cluster with a Infiniband interconnect. I used the Intel compilers (13.1), mkl(11.0), fftw-2.1.5, mvapich2(1.9) and xerces-2.8. My makefile is included in the attached zip file.
I was able to run all of the tests provided with the software on more than 8 mpi processes.
However, when I try to run my case on more than 8 mpi processes, Qbox hangs while reading the pseudopotential files. Qbox runs fine on 8 mpi processes.
The input for my case is in the attached zip file. Any help in resolving this issue would be appreciated.
Thanks,
John J. Low
Math and Computer Science
Argonne National Laboratory
Argonne, Illinois
Re: Qbox hangs when running more than 8 mpi processes.
Posted: Thu Jan 31, 2013 11:57 pm
by fgygi
I seems that the input file "cristobolite.i" has CRLF line terminators. This is likely to confuse the Qbox line interpreter, which expects Unix ASCII text. (this may be just the result of cutting and pasting on a non-Unix machine though).
Also it seems that the name of the Na potential file has a typo in it.
After fixing these errors, I was able to run that script with 8 MPI tasks. It uses about 245 MB per task.
I could also run it on 16 tasks on an AMD cluster with Infiniband (4 tasks/node). The input and output files are attached (4 iterations only).
Could you attach the output file up to and including the point where it hangs?
Re: Qbox hangs when running more than 8 mpi processes.
Posted: Fri Feb 01, 2013 8:46 pm
by jlow
fgygi,
Something must have gotten corrupted between the server, my windows desktop and the Qbox list. I don't see any carriage returns or line feeds in my input files on the server.
My original input data is the essentially same as yours. I have an comment card in my input which is missing in your input.
I get the same error with your input when I run on more 8 MPI processes. This case runs on less than 8 MPI processes.
I have attached all the files from a test with the input attached in your previous post.
Thanks for you help.
John
Re: Qbox hangs when running more than 8 mpi processes.
Posted: Fri Feb 01, 2013 11:31 pm
by fgygi
I see that all 16 tasks are running on the same node in your test. What is the memory available on that node? It could be a problem with this calculation since it uses a large plane wave cutoff. However I would expect that this might cause a problem later in the execution, not when defining the species.
It appears that the hang occurs where Qbox uses the Xerces XML parser to read the species file. I don't see though how this could not work on more than 8 task and work properly on less than 8 tasks.
Could you attach the output you get in the case where it works (with 8 tasks)?
Re: Qbox hangs when running more than 8 mpi processes.
Posted: Sun Feb 03, 2013 6:23 pm
by jlow
Fgygi,
Each node has 16 cores (two eight core sandy-bridge processors) and 62 gigabytes of memory.
Are you suggesting I try to run eight cores per node on more than one node?
I have attached the output from a run which completed on 8 cores on one node.
John
Re: Qbox hangs when running more than 8 mpi processes.
Posted: Sun Feb 03, 2013 11:03 pm
by fgygi
It seems that the attached file contains the output of the unsuccessful test on 16 cores.
Regarding memory usage, 62 GB is more than enough for this run (by a lot!).
Re: Qbox hangs when running more than 8 mpi processes.
Posted: Mon Feb 04, 2013 8:15 pm
by jlow
Fgygi,
The attached file contains the output from a successful 8 core run.
I did not include the huge "sample" xml file because it takes too long to upload.
John
Re: Qbox hangs when running more than 8 mpi processes.
Posted: Mon Feb 04, 2013 11:10 pm
by fgygi
John,
Thanks. I looked at the output and I can't see anything wrong with it. At this point I can only think that there could be a problem with the way Qbox was compiled. I enclose a makefile for my cluster on which I built with Intel icc, and used the MKL libraries, in case this could help identify a problem.
Francois
Code: Select all
#-------------------------------------------------------------------------------
#
# pencil.mk
#
#-------------------------------------------------------------------------------
#
PLT=x86_64
#-------------------------------------------------------------------------------
MPIDIR=/usr/mpi/qlogic
XERCESCDIR=$(HOME)/software/xerces/xerces-c-src_2_8_0
PLTOBJECTS = readTSC.o
CXX=icc
LD=$(CXX)
PLTFLAGS += -DIA32 -DUSE_FFTW -D_LARGEFILE_SOURCE \
-D_FILE_OFFSET_BITS=64 -DUSE_MPI -DSCALAPACK -DADD_ \
-DAPP_NO_THREADS -DXML_USE_NO_THREADS -DUSE_XERCES
FFTWDIR=$(HOME)/software/fftw/x86_64/fftw-2.1.5/fftw
INCLUDE = -I$(MPIDIR)/include -I$(FFTWDIR) -I$(XERCESCDIR)/include
CXXFLAGS= -g -O3 -vec-report1 -D$(PLT) $(INCLUDE) $(PLTFLAGS) $(DFLAGS)
LIBPATH = -L$(MPIDIR)/lib64 -L$(FFTWDIR)/.libs -L$(XERCESCDIR)/lib
LIBS = $(PLIBS) \
-lmkl_intel_lp64 \
-lmkl_lapack95_lp64 -lmkl_sequential -lmkl_core \
-lirc -lifcore -lsvml \
-lmpich -lfftw -luuid $(XERCESCDIR)/lib/libxerces-c.a -lpthread
# Parallel libraries
PLIBS = -lmkl_scalapack_lp64 -lmkl_blacs_lp64
LDFLAGS = $(LIBPATH) $(LIBS)
#-------------------------------------------------------------------------------
Re: Qbox hangs when running more than 8 mpi processes.
Posted: Wed Feb 06, 2013 8:16 pm
by jlow
Francois,
I get the same behavior when I use your .mk file and my makefile. Qbox will run to completion for 8 or less processors.
On more than eight processors Qbox appears to be in a infinite loop while creating the first species and runs (generating no output) until I stop it with a <ctrl-c>.
The energies for this test (on 8 cores) computed on my server are different than yours.
Could you tell me which version of the intel compilers and mkl you are using?
I have attached results for my cristobalite test from qbox built with pencil.mk (your makefile).
John
Re: Qbox hangs when running more than 8 mpi processes.
Posted: Sat Feb 09, 2013 12:40 am
by fgygi
John,
The results in file gs1.r were obtained using 16 MPI tasks. The energies differ from your results obtained on 8 MPI tasks because the random initialization of the wave functions depends on the number of tasks, and after only 4 iterations the energy is far from converged. I have rerun the same input on 8 MPI tasks and got the exact same energies as in your run 8proc.log (see attached file gs5.tar). Of course, all energies, when converged to the ground state, are independent of the number of tasks.
As a side comment, I note that this test uses PBE pseudopotentials but the input file does not specify the xc functional, which is therefore by default LDA. In order to get consistent physical quantities, make sure to add "set xc PBE" to the input file. Conversely, if you want to use LDA, you should use the LDA version of the pseudopotentials, and use the default xc value (LDA).
I am wondering about the possibility that there is a problem with your MPI setup. Which flavor of MPI do you use? Is there a file defining the nodes on which the program can run (i.e. "machinefile"), and possibly where a maximum number of tasks is defined?
I use icc 12.1.3 and MKL 10.3 update 9.
Francois