Transferring Data

From UMass GHPCC User Wiki
Jump to: navigation, search

Transferring data into MGHPCC

There's a few different ways to get your data onto MGHPCC, but all involve the use of SSH. Built into SSH is SCP (secure copy) which can copy files from one place to another using a secure link. You can also use other protocols such as rsync that use SSH as the transport layer. Using rsync can be faster in some instances as it only transfers data blocks that have changed between the two destinations. If you want to keep a directory on MGHPCC the same as a directory on your local system, rsync can make this an easy process.

Using scp

Transferring files with scp is pretty straightforward:

[mfk@ghpcc06 ~]$ scp localfile mfk@remotehost:

This will copy the file localfile to the remote system remotehost using the login name of mfk.

Using rsync

Transferring files with rsync is a bit more complicated but as noted earlier can improve transfer performance.

[mfk@ghpcc06 ~]$ rsync -av -e ssh localfile mfk@remotehost:

As with above, we're using rsync to copy localfile to remotehost connecting as username mfk. However, we use additional flags for rsync to specify using SSH as connection protocol and giving verbose information about the transfer. Additionally, using the -a option will ensure that the timestamp for the remote file is the same as the local file.

You can use rsync to also recursively transfer directories.

[mfk@ghpcc06 ~]$ rsync -av -e ssh /home/mfk/localdirectory mfk@remotehost:/home/mfk/remotedirectory/

The above command will copy the directory localdirectory and its contents to remotehost and put it un the /home/mfk/remotedirectory location. Thus the contents will be in /home/mfk/remotedirectory/localdirectory.

Transferring data from outside UMass system

If you need to download data from an external site like the NIH, you should consider doing the download from a compute node in either an interactive or batch session. You can download data in parallel, but please know that bottlenecks in the outbound connection mean that your download speeds may not greatly increase if you start multiple simultaneous downloads. You may download data directly on the dead node (ghpcc06) but as this is shared with all MGHPCC users this is discouraged. Please feel free to contact MGHPCC staff if you have questions about downloading data. We are happy to assist you to make sure you can get your data quickly.