OpenMPI bug?
- JMB
- Topic Author
- Offline
- Elite Member
-
Less
More
- Posts: 166
- Thank you received: 0
14 years 7 months ago #4689
by JMB
OpenMPI bug? was created by JMB
Hello,
There is a test case mentioned for parallel code 'perf010b' posted here: < www.code-aster.org/forum2/viewtopic.php?id=13841 >
which I decided to run on a stock CAELinux2010.1 fresh install (+ regular Ubuntu repo updates) in a quad core PC. When I run the job with no changes using the *.export file provided (where the settings are: ncpus=2; mpi_nbcpu=1; mpi_nbnoeuod=1) I get a bizarre crash.
[code:1]
[ubuntu21:07836] *** Process received signal ***
[ubuntu21:07836] Signal: Segmentation fault (11)
[ubuntu21:07836] Signal code: Address not mapped (1)
[ubuntu21:07836] Failing at address: (nil)
[ubuntu21:07836] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7f1e9d3a98f0]
[ubuntu21:07836] [ 1] /usr/lib/libpython2.6.so.1.0(PyObject_Malloc+0x62) [0x7f1ea09469f2]
[ubuntu21:07836] [ 2] /usr/lib/libpython2.6.so.1.0(PyString_FromString+0x85) [0x7f1ea0950625]
[ubuntu21:07836] [ 3] /usr/lib/libpython2.6.so.1.0(PyString_InternFromString+0x9) [0x7f1ea0950a79]
[ubuntu21:07836] [ 4] /usr/lib/libpython2.6.so.1.0(PyObject_GetAttrString+0x38) [0x7f1ea0944988]
[ubuntu21:07836] [ 5] /usr/lib/libpython2.6.so.1.0(PyObject_CallMethod+0x76) [0x7f1ea0904b66]
[ubuntu21:07836] [ 6] ./asteru_mpi(utprin_+0x1bf) [0x4e9f8f]
.
.
.
[/code:1]
But if I change the job to use ncpus=1, then it is ENDED_OK. Can somebody explain what is going on? Thanks.
Regards,
JMB
There is a test case mentioned for parallel code 'perf010b' posted here: < www.code-aster.org/forum2/viewtopic.php?id=13841 >
which I decided to run on a stock CAELinux2010.1 fresh install (+ regular Ubuntu repo updates) in a quad core PC. When I run the job with no changes using the *.export file provided (where the settings are: ncpus=2; mpi_nbcpu=1; mpi_nbnoeuod=1) I get a bizarre crash.
[code:1]
[ubuntu21:07836] *** Process received signal ***
[ubuntu21:07836] Signal: Segmentation fault (11)
[ubuntu21:07836] Signal code: Address not mapped (1)
[ubuntu21:07836] Failing at address: (nil)
[ubuntu21:07836] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7f1e9d3a98f0]
[ubuntu21:07836] [ 1] /usr/lib/libpython2.6.so.1.0(PyObject_Malloc+0x62) [0x7f1ea09469f2]
[ubuntu21:07836] [ 2] /usr/lib/libpython2.6.so.1.0(PyString_FromString+0x85) [0x7f1ea0950625]
[ubuntu21:07836] [ 3] /usr/lib/libpython2.6.so.1.0(PyString_InternFromString+0x9) [0x7f1ea0950a79]
[ubuntu21:07836] [ 4] /usr/lib/libpython2.6.so.1.0(PyObject_GetAttrString+0x38) [0x7f1ea0944988]
[ubuntu21:07836] [ 5] /usr/lib/libpython2.6.so.1.0(PyObject_CallMethod+0x76) [0x7f1ea0904b66]
[ubuntu21:07836] [ 6] ./asteru_mpi(utprin_+0x1bf) [0x4e9f8f]
.
.
.
[/code:1]
But if I change the job to use ncpus=1, then it is ENDED_OK. Can somebody explain what is going on? Thanks.
Regards,
JMB
- Joël Cugnoni
-
- Offline
- Moderator
-
14 years 7 months ago #4690
by Joël Cugnoni
Joël Cugnoni - a.k.a admin
www.caelinux.com
Replied by Joël Cugnoni on topic Re:OpenMPI bug?
Hi JMB
actually there are 2 versions of Aster in CAELinux 2010:
- one that is part of Salome-Meca which was compiled by EDF using Intel compiler with OpenMP support (nbcpu>1) for shared memory parallelism on the standard solver Mult_Front. But this version does not support MPI Mumps or PETSC solvers.
This solver is installed in /opt/SALOME-MECA*/aster.
To use it from terminal, you will need to source the Salome-Meca environment:
[code:1]. /opt/SALOME-MECA*/envSalomeMeca.sh[/code:1]
( or something similar, I don't remember the exact path by heart)
-the second was compiled with GNU compilers and ACML libs without OpenMP support (nbcpu must be =1, no paralelism of Mult_front) but with MPI support for Mumps and Petsc solvers (mpi_nbcpu>=1).
This version is installed in /opt/aster101 and is the one that is available from Astk when you run it from the CAElinux menu.
The reason for the lack of openmp support is because of a bug of Gnu compiler when handling openmp shared arrays... so not easily fixable.
To make it short :
if you want to use a parallel Mult-front solver, you will need OpenMP so you must use the version from Salome-Meca
if you want to use a parallel Mumps or petsc solver, you need MPI and thus the version that I compiled
There is no reason to use both nbcpu>1 and mpi_nbcpu>1 or mpi_nbnoeud >1 because no solver benefits from both OpenMP and MPI.
I hope that it clarifies this complicated subject..
actually there are 2 versions of Aster in CAELinux 2010:
- one that is part of Salome-Meca which was compiled by EDF using Intel compiler with OpenMP support (nbcpu>1) for shared memory parallelism on the standard solver Mult_Front. But this version does not support MPI Mumps or PETSC solvers.
This solver is installed in /opt/SALOME-MECA*/aster.
To use it from terminal, you will need to source the Salome-Meca environment:
[code:1]. /opt/SALOME-MECA*/envSalomeMeca.sh[/code:1]
( or something similar, I don't remember the exact path by heart)
-the second was compiled with GNU compilers and ACML libs without OpenMP support (nbcpu must be =1, no paralelism of Mult_front) but with MPI support for Mumps and Petsc solvers (mpi_nbcpu>=1).
This version is installed in /opt/aster101 and is the one that is available from Astk when you run it from the CAElinux menu.
The reason for the lack of openmp support is because of a bug of Gnu compiler when handling openmp shared arrays... so not easily fixable.
To make it short :
if you want to use a parallel Mult-front solver, you will need OpenMP so you must use the version from Salome-Meca
if you want to use a parallel Mumps or petsc solver, you need MPI and thus the version that I compiled
There is no reason to use both nbcpu>1 and mpi_nbcpu>1 or mpi_nbnoeud >1 because no solver benefits from both OpenMP and MPI.
I hope that it clarifies this complicated subject..
Joël Cugnoni - a.k.a admin
www.caelinux.com
- Peter Halverson
- Offline
- Senior Member
-
Less
More
- Posts: 45
- Thank you received: 0
14 years 7 months ago #4692
by Peter Halverson
Replied by Peter Halverson on topic Re:OpenMPI bug?
I had a question on this as well. I have noticed that the openmpi version works quite well. I am using the openmpi version /opt/aster101/bin/as_run. However, it does not work as I had expected. If I set mpi_nbcpu 2, 2 processors kick on and it solves much quicker than when 1 processor is on. However, this is without specifying SOLVEUR(METHODE='MUMPS'). In-fact when I add the keyword, aster gives me an error message. I was under the impression that openmpi only worked under PETSC and MUMPS. Does the install change the default methode? If so to which solver MUMPS?
- Joël Cugnoni
-
- Offline
- Moderator
-
14 years 7 months ago #4694
by Joël Cugnoni
Joël Cugnoni - a.k.a admin
www.caelinux.com
Replied by Joël Cugnoni on topic Re:OpenMPI bug?
Hi,
normally the default solver should not be changed, so you should not see such improvement in calculation speed wit mult-front... strange.
Maybe the assembly phase is carried out in parallel ?
Another thing that I forgot to mention, to use MUMPS you should select RENUM = PORD as I did not manage to compile METIS for theparallel MUMPS solver.. it is probably not as optimal but it works.
Joel
normally the default solver should not be changed, so you should not see such improvement in calculation speed wit mult-front... strange.
Maybe the assembly phase is carried out in parallel ?
Another thing that I forgot to mention, to use MUMPS you should select RENUM = PORD as I did not manage to compile METIS for theparallel MUMPS solver.. it is probably not as optimal but it works.
Joel
Joël Cugnoni - a.k.a admin
www.caelinux.com
- JMB
- Topic Author
- Offline
- Elite Member
-
Less
More
- Posts: 166
- Thank you received: 0
14 years 7 months ago #4695
by JMB
Replied by JMB on topic Re:OpenMPI bug?
Administrator wrote:
Hi Admin,
Thanks for the clarifications. To summarize:
[code:1]
Parallel Code OpenMPI OpenMP
============= ======= ======
shared memory parallelism
CodeAster location: /opt/aster101/... /opt/SALOME-MECA-2010.1-x86_64/aster/...
Compilers used: GNU/ACML Intel/OpenMP
Run using: ASTK Open a terminal:
source /opt/SALOME-MECA-2010.1-x86_64/envSalomeMeca.sh
etc.
Compatible solver(s): Mumps / PetSc MultiFront
CPUs (cores): ncpu = 1 ncpu >= 1
MPI CPUs (cores) mpi_nbcpu >= 1 mpi_nbcpu = 1
PCs (nodes): mpi_nbNoeud >= 1 mpi_nbNoeud = 1
Cluster capable: Yes No
MultiCore capable: Yes Yes
[/code:1]
Let me know if I have interpreted your explanation correctly.
Regards,
JMB<br /><br />Post edited by: JMB, at: 2010/09/09 23:15
Hi JMB, actually there are 2 versions of Aster in CAELinux 2010:
Hi Admin,
Thanks for the clarifications. To summarize:
[code:1]
Parallel Code OpenMPI OpenMP
============= ======= ======
shared memory parallelism
CodeAster location: /opt/aster101/... /opt/SALOME-MECA-2010.1-x86_64/aster/...
Compilers used: GNU/ACML Intel/OpenMP
Run using: ASTK Open a terminal:
source /opt/SALOME-MECA-2010.1-x86_64/envSalomeMeca.sh
etc.
Compatible solver(s): Mumps / PetSc MultiFront
CPUs (cores): ncpu = 1 ncpu >= 1
MPI CPUs (cores) mpi_nbcpu >= 1 mpi_nbcpu = 1
PCs (nodes): mpi_nbNoeud >= 1 mpi_nbNoeud = 1
Cluster capable: Yes No
MultiCore capable: Yes Yes
[/code:1]
Let me know if I have interpreted your explanation correctly.
Regards,
JMB<br /><br />Post edited by: JMB, at: 2010/09/09 23:15
- JMB
- Topic Author
- Offline
- Elite Member
-
Less
More
- Posts: 166
- Thank you received: 0
14 years 7 months ago #4696
by JMB
Replied by JMB on topic Re:OpenMPI bug?
Hello admin,
Which means that in order to run a CodeAster job in a cluster (of separate) PCs one would use the version compiled by you, using ASTK and setting ncpu = 1, mpi_nbcpu >= 1 & mpi_nbNoeud >= 1. Is that correct?
If a cluster has one Single core PC and another is a Dual-core PC then:
ncpu = 1, mpi_nbcpu = 3 & mpi_nbNoeud =2. Is this correct?
Regards,
JMB
Which means that in order to run a CodeAster job in a cluster (of separate) PCs one would use the version compiled by you, using ASTK and setting ncpu = 1, mpi_nbcpu >= 1 & mpi_nbNoeud >= 1. Is that correct?
If a cluster has one Single core PC and another is a Dual-core PC then:
ncpu = 1, mpi_nbcpu = 3 & mpi_nbNoeud =2. Is this correct?
Regards,
JMB
Moderators: catux
Time to create page: 0.156 seconds