Caelinux cluster

More
13 years 7 months ago #4687 by JMB
Replied by JMB on topic Re:Caelinux cluster
Hello,

I have made progress loading CAELinux2010.1 on two PC (ubuntu1 Quadcore and ubuntu5 single core), but I am not fully sure of the options of ASTK:

ncpus = ?
mpi_nbcpu = 4 (or 5?)
mpi_nbnoeud = 1 (2 causes a CA error)

How do I setup ASTK GUI to use both PCs?

The ~/.astkrc/mpi_hostfile (which is supposed to override defaults):
[code:1]
ubuntu1 slots=4 max-slots=4
ubuntu5 slots=1 max-slots=1
[/code:1]

The /opt/aster101/etc/codeaster/aster-mpihosts:
[code:1]
ubuntu1 slots=4
ubuntu5 slots=1
[/code:1]

The ~/.astkrc/config file is:
[code:1]
# as_run : user preferences file

# You can override here all values of $ASTER_ROOT/etc/codeaster/asrun

# remote protocol used for shell commands
remote_shell_protocol : SSH

# remote protocol used to copy files and directories
remote_copy_protocol : SCP

# editor command
editor : nedit

# login on the development server
# (name/ip address is usually set in /etc/codeaster/asrun)
devel_server_user :

#per openmpi 1.3.2
#mpi_get_procid_cmd : /home/aster/procid

#mpich2
mpirun_cmd : mpiexec -machinefile %(mpi_hostfile)s -wdir %(wrkdir)s -n %(mpi_nbcpu)s %(program)s
#mpirun_cmd : mpiexec -n 2 %(program)s

mpi_hostfile : /home/ks/.astkrc/mpi_hostfile
[/code:1]

So far I know this works:
[code:1]
mpirun -np 2 --host ubuntu1,ubuntu5 hostname
# ubuntu5
# ubuntu1
[/code:1]

Also both:
ubuntu5$: mpirun -np 4 /opt/aster101/STA10.1/asteru_mpi -c "print 'Hello World'"
ubuntu5$: mpirun --host ubuntu1 -np 4 /opt/aster101/STA10.1/asteru_mpi -c "print 'Hello World'"
works correctly, displaying 4 times 'Hello World'...

But for CodeAster ASTK, if I use: ncpus = 2; mpi_nbcpu = 5; & mpi_nbnoeud = 2 I get:
<E>_INCORRECT_PARA Requested number of MPI nodes (2) is higher than the limit (1)
<E>_INCORRECT_PARA Requested number of MPI processors (5) is higher than the limit (4)

Am I overlooking something else in the configuration process? Am I starting a cluster job incorrectly?

Regards,
JMB

Post edited by: JMB, at: 2010/09/08 21:03<br /><br />Post edited by: JMB, at: 2010/09/08 21:23

Please Log in or Create an account to join the conversation.

  • Filippo Monari
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
13 years 7 months ago #4702 by Filippo Monari
Replied by Filippo Monari on topic Re:Caelinux cluster
hi thank you for your help.
I have studied the problem a little (I'm not an exper of computer network, but i want make some experiment). I have seen some open source tool such as condor package for linux that help to create a cluster.
My idea was to make a 3/4 node cluster, with a master node (aa normal pc) and 2/3 slave node built only with mother board, ram and cpu which make the boot process by network, so I hvant to install in each node CAElinux.
is it possible? Can u help me suggesting any kind of useful guide?
Thank again for your help.

Please Log in or Create an account to join the conversation.

More
13 years 7 months ago #4703 by JMB
Replied by JMB on topic Re:Caelinux cluster
Filippo Monari wrote:

My idea was to make a 3/4 node cluster, with a master node (aa normal pc) and 2/3 slave node built only with mother board, ram and cpu which make the boot process by network, so I haven't to install in each node CAElinux. is it possible?


Hello Filippo Monari,

Yes, it is possible. I had done it several years ago, but do not remember the details. The basic requirement is you should have slave PCs that are PXE-Bootable (ie network card supporting PXE Boot). That was the basis of ClusterKnoppix, now no longer maintained.

The overall scheme is something like this:
Set up your master node to be a DHCP server and install NetBoot (I think) so that it serves out the bootstrap loader, an initramfs, kernel etc. or some such similar concept. The slaves are setup as DHCP clients and to get their IP address and bootstrap program from the master.

I think the Ubuntu website or other forums have HotTo's on the subject, since it has been done several years ago and is fairly mature and robust process.

Another, idea is to use a special Ubuntu LTSP (Linux Terminal Server Project) iso that makes the process very easy. I had tried that a few years ago and it works well.

Regards...

Please Log in or Create an account to join the conversation.

More
13 years 7 months ago #4704 by Joël Cugnoni
Replied by Joël Cugnoni on topic Re:Caelinux cluster
Hi

actually I had prototyped this kind of centralized cluster last year.

I found that using DRBL package, it is relativelly easy to setup a single image cluster that boots from the network.

Actually DRBL is a set of configuration script to setup netbooting and manage machine images for a cluster.

Try to find Ubuntu DRBL on google and it should point you to numerous docs on how to set it up.

The good point is that it allows to create a cluster starting from a standard install of CAELinux and thus does not require recreating the full distro manually (which is the case for many Cluster environments like Rocks)

PS I am also prototyping a CAElinux cluster in the cloud on Amazon EC2... will let you know if it works well.

Joël Cugnoni - a.k.a admin
www.caelinux.com

Please Log in or Create an account to join the conversation.

More
13 years 7 months ago #4711 by Peter Halverson
Replied by Peter Halverson on topic Re:Caelinux cluster
Administrator wrote:

PS I am also prototyping a CAElinux cluster in the cloud on Amazon EC2... will let you know if it works well.


Please don't. That would make it very hard for me to justify having my own personal cluster :) . Actually, please let me know how it goes. The spot pricing of an EC2 cluster makes it very attractive.

Please Log in or Create an account to join the conversation.

More
13 years 7 months ago #4713 by Joël Cugnoni
Replied by Joël Cugnoni on topic Re:Caelinux cluster
Sorry.. I think that I did it yesterday evening ;-)

I have to say that it was quite tough to package the distro for EC2 as I had to start from blank ubuntu image and &quot;restore&quot; all the peculiarities of CAElinux.. at the moment most code work well in EC2 and but I need to fix some problems with xhost/interacive followup.
Using Neatx NX server, you can even get a full remote desktop in the cloud, with a pretty good reaction time...

The use case in my opionion would be to prepare small models to prototype the simulation cases using a laptop and then migrating the project to EC2, generate a fine mesh and run a full featured simulation using some hardware that you could only dream of buying (who can afford a dual socket quad core xeon with 64gb ram??) !

I did also a small benchmark to identify the exact performance of EC2 compute units, and found out that one core of my Phenom 2 X4 3.4ghz is approx 3.5 ECU, so 1 ECU = 1 Ghz of recent Opteron/Phenom core. The scaling with increasing CPUs was not great in Aster compared to my quad core PC, but it is probably because of IO bottleneck.. I will try to find a solution to that problem. For CFD / segregated solvers like Elmer, OpenFOAM and Saturne, this won't be a problem anyway.

Then I will try to develop some simple scripts + python GUI to setup a basic cluster On-the-fly from a master node.
I will let you know when it is ready for testing.

Joel

Joël Cugnoni - a.k.a admin
www.caelinux.com

Please Log in or Create an account to join the conversation.

Moderators: catux
Time to create page: 0.483 seconds
Powered by Kunena Forum