×

Notice

The forum is in read only mode.

OpenMPI bug?

More
14 years 7 months ago #4689 by JMB
OpenMPI bug? was created by JMB
Hello,

There is a test case mentioned for parallel code 'perf010b' posted here: < www.code-aster.org/forum2/viewtopic.php?id=13841 >
which I decided to run on a stock CAELinux2010.1 fresh install (+ regular Ubuntu repo updates) in a quad core PC. When I run the job with no changes using the *.export file provided (where the settings are: ncpus=2; mpi_nbcpu=1; mpi_nbnoeuod=1) I get a bizarre crash.
[code:1]
[ubuntu21:07836] *** Process received signal ***
[ubuntu21:07836] Signal: Segmentation fault (11)
[ubuntu21:07836] Signal code: Address not mapped (1)
[ubuntu21:07836] Failing at address: (nil)
[ubuntu21:07836] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7f1e9d3a98f0]
[ubuntu21:07836] [ 1] /usr/lib/libpython2.6.so.1.0(PyObject_Malloc+0x62) [0x7f1ea09469f2]
[ubuntu21:07836] [ 2] /usr/lib/libpython2.6.so.1.0(PyString_FromString+0x85) [0x7f1ea0950625]
[ubuntu21:07836] [ 3] /usr/lib/libpython2.6.so.1.0(PyString_InternFromString+0x9) [0x7f1ea0950a79]
[ubuntu21:07836] [ 4] /usr/lib/libpython2.6.so.1.0(PyObject_GetAttrString+0x38) [0x7f1ea0944988]
[ubuntu21:07836] [ 5] /usr/lib/libpython2.6.so.1.0(PyObject_CallMethod+0x76) [0x7f1ea0904b66]
[ubuntu21:07836] [ 6] ./asteru_mpi(utprin_+0x1bf) [0x4e9f8f]
.
.
.
[/code:1]
But if I change the job to use ncpus=1, then it is ENDED_OK. Can somebody explain what is going on? Thanks.

Regards,
JMB
More
14 years 7 months ago #4690 by Joël Cugnoni
Replied by Joël Cugnoni on topic Re:OpenMPI bug?
Hi JMB

actually there are 2 versions of Aster in CAELinux 2010:
- one that is part of Salome-Meca which was compiled by EDF using Intel compiler with OpenMP support (nbcpu>1) for shared memory parallelism on the standard solver Mult_Front. But this version does not support MPI Mumps or PETSC solvers.
This solver is installed in /opt/SALOME-MECA*/aster.
To use it from terminal, you will need to source the Salome-Meca environment:
[code:1]. /opt/SALOME-MECA*/envSalomeMeca.sh[/code:1]
( or something similar, I don't remember the exact path by heart)

-the second was compiled with GNU compilers and ACML libs without OpenMP support (nbcpu must be =1, no paralelism of Mult_front) but with MPI support for Mumps and Petsc solvers (mpi_nbcpu>=1).
This version is installed in /opt/aster101 and is the one that is available from Astk when you run it from the CAElinux menu.
The reason for the lack of openmp support is because of a bug of Gnu compiler when handling openmp shared arrays... so not easily fixable.

To make it short :
if you want to use a parallel Mult-front solver, you will need OpenMP so you must use the version from Salome-Meca
if you want to use a parallel Mumps or petsc solver, you need MPI and thus the version that I compiled

There is no reason to use both nbcpu>1 and mpi_nbcpu>1 or mpi_nbnoeud >1 because no solver benefits from both OpenMP and MPI.

I hope that it clarifies this complicated subject..

Joël Cugnoni - a.k.a admin
www.caelinux.com
More
14 years 7 months ago #4692 by Peter Halverson
Replied by Peter Halverson on topic Re:OpenMPI bug?
I had a question on this as well. I have noticed that the openmpi version works quite well. I am using the openmpi version /opt/aster101/bin/as_run. However, it does not work as I had expected. If I set mpi_nbcpu 2, 2 processors kick on and it solves much quicker than when 1 processor is on. However, this is without specifying SOLVEUR(METHODE='MUMPS'). In-fact when I add the keyword, aster gives me an error message. I was under the impression that openmpi only worked under PETSC and MUMPS. Does the install change the default methode? If so to which solver MUMPS?
More
14 years 7 months ago #4694 by Joël Cugnoni
Replied by Joël Cugnoni on topic Re:OpenMPI bug?
Hi,

normally the default solver should not be changed, so you should not see such improvement in calculation speed wit mult-front... strange.
Maybe the assembly phase is carried out in parallel ?

Another thing that I forgot to mention, to use MUMPS you should select RENUM = PORD as I did not manage to compile METIS for theparallel MUMPS solver.. it is probably not as optimal but it works.

Joel

Joël Cugnoni - a.k.a admin
www.caelinux.com
More
14 years 7 months ago #4695 by JMB
Replied by JMB on topic Re:OpenMPI bug?
Administrator wrote:

Hi JMB, actually there are 2 versions of Aster in CAELinux 2010:


Hi Admin,
Thanks for the clarifications. To summarize:

[code:1]
Parallel Code OpenMPI OpenMP
============= ======= ======
shared memory parallelism
CodeAster location: /opt/aster101/... /opt/SALOME-MECA-2010.1-x86_64/aster/...
Compilers used: GNU/ACML Intel/OpenMP

Run using: ASTK Open a terminal:
source /opt/SALOME-MECA-2010.1-x86_64/envSalomeMeca.sh
etc.

Compatible solver(s): Mumps / PetSc MultiFront

CPUs (cores): ncpu = 1 ncpu >= 1
MPI CPUs (cores) mpi_nbcpu >= 1 mpi_nbcpu = 1
PCs (nodes): mpi_nbNoeud >= 1 mpi_nbNoeud = 1

Cluster capable: Yes No
MultiCore capable: Yes Yes
[/code:1]

Let me know if I have interpreted your explanation correctly.

Regards,
JMB<br /><br />Post edited by: JMB, at: 2010/09/09 23:15
More
14 years 7 months ago #4696 by JMB
Replied by JMB on topic Re:OpenMPI bug?
Hello admin,

Which means that in order to run a CodeAster job in a cluster (of separate) PCs one would use the version compiled by you, using ASTK and setting ncpu = 1, mpi_nbcpu &gt;= 1 &amp; mpi_nbNoeud &gt;= 1. Is that correct?

If a cluster has one Single core PC and another is a Dual-core PC then:
ncpu = 1, mpi_nbcpu = 3 &amp; mpi_nbNoeud =2. Is this correct?

Regards,
JMB
Moderators: catux
Time to create page: 0.156 seconds
Powered by Kunena Forum