Skip to main content
We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.

Questions tagged [cluster]

discussion related to cluster mechanisms.

Filter by
Sorted by
Tagged with
1 vote
0 answers
27 views

I am having 2 servers (Debian 12) that use a storage-disk (SD). Both see this SD as a device via fdisk. I have no details about the storage-device itself or the connection type - for me it is just a ...
chris01's user avatar
  • 1,039
0 votes
0 answers
54 views

I’m looking for some guidance on expanding my Proxmox setup. Here’s my current setup and what I’m trying to achieve: Current Setup​ I have a dedicated OVH server running Proxmox. On this server, I ...
Zakaria Ait Yakoub's user avatar
0 votes
0 answers
34 views

I have 10 node deployment which implement red hat clustering software - pacemaker/corosync to mount gfs2 and ensure high-availability. Nodes are actually mail servers and use gfs2 to store user's data ...
brchelli26's user avatar
0 votes
0 answers
31 views

I am compiling a custom linux kernel for a compute cluster. The cluster is currently running on kernel version 4.4.47 since last 5 years. I need to upgrade the kernel to a more recent version. I've ...
Sâu's user avatar
  • 101
1 vote
1 answer
424 views

I have six Linux servers running RHEL 8.6 - and need to ensure that one specific service is running at least one and at most one of those six servers. Does systemd support something like this? If not, ...
The Programmer's user avatar
1 vote
0 answers
23 views

I'm working on a SGE linux cluster and beginners often run memory/resource consuming tools on the login node instead of using qsub or qlogin ( https://gridscheduler.sourceforge.net/htmlman/htmlman1/...
Pierre's user avatar
  • 1,793
1 vote
1 answer
135 views

I recently switched to slurm and looking for a job submission tool, that behaves similar to qsub: It takes input through a pipe It prints the output to stdout Example: for n in `seq 1 10`; do ...
LazyCat's user avatar
  • 188
0 votes
2 answers
560 views

I am trying to install slurm on Ubuntu PC. Therefore, I followed the instructions given over here I did the following - sudo apt update -y sudo apt install slurmd slurmctld -y mkdir sudo /etc/slurm-...
desert_ranger's user avatar
1 vote
0 answers
65 views

I want to run a shell script on a compute cluster but I get an error because at some point it is looking for a module that does not exist since a major update on the cluster a few months ago. This ...
Seb's user avatar
  • 11
0 votes
1 answer
60 views

I have 3 VPS. Let's say master, slave1, slave2. Their specifications are identic. Processor: 1CPU Memory: 1GB Disk: 10GB Network: running on LAN each other I expect any arbitrary binary program (...
Muhammad Ikhwan Perwira's user avatar
1 vote
1 answer
177 views

I'm working on a project building a CPU cluster, and those servers and NFS storage (not a parallel file system) are going to be connected through HDR InfiniBand cables. In this architecture, can I get ...
Antenna_'s user avatar
1 vote
1 answer
56 views

I recently set up my own home cluster - 4 units of raspberry pi. But I am having problems trying to benchmark all 4 units using Linpack One node is the head node called rpislave1, it connects to the ...
AlexChan's user avatar
2 votes
1 answer
368 views

I am managing multiple GPU servers in our lab, which are mainly used for deep learning tasks. We would like these machines to share the same file system, so it is easier to switch between them. ...
x.y.z liu's user avatar
0 votes
0 answers
83 views

I'm trying to learn the basics of Linux clustering so I started designing a really humble cluster: 6 worker nodes (Libre Computer La Frite | Cortex-A53 @ 1.2 GHz | 1GB RAM) 1 master node (Raspberry ...
phreq's user avatar
  • 1
0 votes
1 answer
282 views

I have a small cluster (all nodes run Debian 10) and need to remove the internet connections of all slave nodes. The internet cable connection connects to a computer that acts as a firewall, then, ...
Carlos Andrés del Valle's user avatar
1 vote
1 answer
601 views

We have to move a set of very large data (in petabytes) from HPC cluster to a storage server. We have a high capacity communication link between the devices. However, the bottleneck seems to be a fast ...
Ikram Ullah's user avatar
-1 votes
1 answer
782 views

I am keen to learn if existing IBM AIX servers from different location have Clustering/HA features. Kindly let me know the steps to check. Thanks.
Nick eric adelee's user avatar
2 votes
1 answer
5k views

So, I am by no means a sysadmin but I need to use an existing SLURM installation to launch a sizable amount of jobs (around 5000). The cluster is composed of 1 node with 10 GPUs (with 8GB of memory ...
jacky la mouette's user avatar
0 votes
1 answer
421 views

I have a CentOS 7 Pacemaker cluster with GFS2 Filesystrems mounted. I'm fairly certain that vgchange -cy vg_name was NOT run during setup. I tried running vgchange --test -cy vg_name and it tells me ...
ex_submariner's user avatar
0 votes
0 answers
64 views

I'm building a small cluster with desktop computers that run on Debian 11. I want a shared /home directory where all user's files are located. I know that the ideal way to do this is to have a master ...
Carlos Andrés del Valle's user avatar
0 votes
0 answers
335 views

I am in the process of small computing cluster assembly. It will run Ubuntu/Slurm for job scheduling. Only the head node will be connected to the Internet, all others will be accessible from the local ...
FNS's user avatar
  • 11
1 vote
0 answers
121 views

I am assembling a simple computing cluster under Ubuntu 20.04 and Slurm as a job scheduler. The cluster will be primarily used for quantum-chemical calculations, so, as a rule, each job will run its ...
FNS's user avatar
  • 11
0 votes
1 answer
172 views

Basically, could we have a file system with odd byte size clusters? Why is everything even? Thanks
pushandpop's user avatar
  • 1,446
0 votes
0 answers
1k views

I get this block of information under my GPFS cluster information when I execute /usr/lpp/mmfs/bin/mmlscluster and I can't find documentation on what the Designation actually means. Does quorum-...
IceTea's user avatar
  • 121
0 votes
0 answers
529 views

I have a cluster with several login nodes and many compute nodes (call it the cluster). Then I have another server with a large shared storage (call it the storage). I need to be able to rsync (i.e. ...
Botond's user avatar
  • 135
-1 votes
1 answer
52 views

I want to submit a task that is interpreted by /bin/csh, which only exists in master node. And I have no root permission but only sudo, which is limited in master node. So I can't use sudo apt install ...
Zhihui's user avatar
  • 1
1 vote
0 answers
108 views

I recently received a Picocluster with five Jetson Nano boards running MicroK8S. The cluster has a built in switch, which I know works as I can route my own network traffic through it just fine. All ...
Thijs van der Heijden's user avatar
1 vote
0 answers
182 views

I have a lab consisting of 3 machines connected with 2 10gbe links on 2 segregated networks. Each device has 100tb in block storage connected to it. I want to use ATA over Ethernet to create a storage ...
Tim's user avatar
  • 111
0 votes
1 answer
1k views

Can someone please walk me through the step-by-step process of configuring a ocfs2 filesystem right from splitting an existing partition? When I tried, I am seeing the below error: mount.ocfs2: ...
Divija Gogineni's user avatar
0 votes
0 answers
47 views

I recently installed anaconda (which includes a python3) locally in my account folder on a cluster with a dozen of nodes (each node with several cores). I use it to install some package P that is used ...
xiaohuamao's user avatar
1 vote
1 answer
2k views

I'm trying to setup a new Debian 10 cluster with three instances. My stack is based on pacemaker, corosync, dlm, and lvmlockd with a GFS2 volume. All servers have access to the GFS2 volume but I can't ...
Me7e0r's user avatar
  • 11
1 vote
1 answer
2k views

I have a R-script which runs multiple files say file=1 to 50. I usually submit repeated jobs say 5 times with 10 files each time by changing the number in R-script. So, how can I submit the 5-job at ...
b_takhel's user avatar
0 votes
1 answer
27 views

We have a Linux cluster in our organization and my data science team is developing a number of ML projects to be utilized by teams across the organization. To enable the teams to access the ML models, ...
kosmos's user avatar
  • 101
1 vote
1 answer
523 views

Migrating from Xen's xm to Xen's xl under control of libvirt, I wonder: Where does libvirt store the "originals" of VM configurations? I found that my PVM configurations are stored in /etc/...
U. Windl's user avatar
  • 1,775
2 votes
1 answer
1k views

My question is related to a python error, but I suspect that it is more a Linux question than a python one. Thus I post it first here. I am running a python script which does a calculation and then ...
Britzel's user avatar
  • 165
0 votes
0 answers
3k views

I am running a lot of jobs with qsub: some are running, some are waiting. Is there a way to cancel all the jobs for a given user which are queued/waiting without giving the individual job IDs?
user443699's user avatar
2 votes
1 answer
2k views

First of all thank you in advance for your help. I hope the title makes sense. Basically, on the headnode the users' home directory (i.e: headnode:/home/eric) are NFS shared and mounted to all the ...
Eric  Alemany's user avatar
0 votes
1 answer
2k views

I get this error from pacemaker after i change apache from http to https. now my ocf::heartbeat:apache resource is not find status page. I generate SSL certificate separately for 3 servers. Everything ...
Karippery's user avatar
2 votes
1 answer
2k views

I am working on a cluster running RHEL and I submit jobs using the following command. sbatch MyScript.sh The content of the MyScript.sh are as below. #!/bin/sh # .... # Other SBATCH related commands ...
Amit's user avatar
  • 123
1 vote
1 answer
303 views

when i compile xhpl i always get the error message: ./xhpl: error while loading shared libraries: libdgemm.so.1: cannot open shared object file: No such file or directory when i type ldd xhpl: linux-...
Tim Tonic's user avatar
1 vote
0 answers
612 views

For the life of me, I can't find a clear answer on how to start my NFS active / passive cluster. I have two nodes, node1 and node2 and followed the guide here: https://www.linuxtechi.com/configure-nfs-...
jasontt33's user avatar
0 votes
1 answer
4k views

I'm creating a cluster system using two ESXi hosts, with a CentOS 7 server on each. Going through I created the filesystem, and it mounts on node1. When I perform a standby or reboot from node01 to ...
markb's user avatar
  • 143
0 votes
2 answers
4k views

I am trying to cleanup a server which had a PowerHA configuration. I have stopped cluster (smitty clstop) and removed resource groups. How do I remove the caavg_private properly? hdisk5 ...
RJ Gellangarin's user avatar
0 votes
1 answer
131 views

I have two node servers with SAN storage. Each node have RHEL 6.9 with HA and the partition are mapped from the storage using fiber cables with clustered resources. The thing is when the two nodes ...
user65285's user avatar
0 votes
1 answer
360 views

I am trying to create an AFM relationship by using GPFS protocol. Having error in cache side cluster. Steps of Home cluster : 1) Create a home cluster (cluster name - gpfs01). 2) Create a file ...
pratiksha chavan's user avatar
1 vote
0 answers
320 views

Trying to implement an OpenLDAP cluster, I already managed to set up the two backend LDAP servers in mirroring mode. The application (iRedMail) using the LDAP service is running on the same systems ...
arminV's user avatar
  • 11
0 votes
0 answers
986 views

I was running what was supposed to be a small make job on a small node in our cluster, but a sorting process seemed to be overwhelming the allocated RAM, so I killed it after 20 hours (same jobs on ...
GenesRus's user avatar
  • 101
0 votes
1 answer
106 views

I have a ccs cluster running on RHEL 6.4 where there is no luci service, i have added a filesystem resource to the cluster with the below command but i need to move the resource inside a service group....
Ritesh Vishwakarma's user avatar
0 votes
1 answer
623 views

I have 3 machines in my local network running manjaro. I am running python scripts using dask, pandas, etc which max out the cpu on the first machine and I usually need to wait more than 30 min until ...
cmosig's user avatar
  • 123
0 votes
2 answers
208 views

A swarm manager nodes handles cluster management tasks such as: 1) Maintaining cluster state 2) Scheduling services 3) Serving swarm mode HTTP API endpoints You may execute any of the - docker ...
overexchange's user avatar
  • 1,606

1
2 3 4 5