A post-doc's point of view on bioinformatics, evolution, and microbial diversity; with an interest in cutting edge computer tools that make them all a bit easier.
Wednesday, February 24, 2010
Using Aspera instead of FTP to download from NCBI
If you often download large amounts of data from NCBI using their FTP site you might be interested in knowing that NCBI has recently started using the commercial software Aspera to improve download transfer speeds. This was announced in their August newsletter and at first was only for the Short Read Archive (SRA). However, I recently found out that they are now making all of their data available.
Thanks for this. It worked well with the Aspera plug-in in Safari in OS/X, about 20X faster than ftp. I may try some experimentation with the command line version- I have some possible future applications where scripting would probably be useful.
I just tried a 10 MB and a 2 GB download from the command line and they were both ok. I did use a slightly different command, with a bandwidth limit: -l 200M.
I found that in some NCBI documentation in the 1000genomes folder on the ftp site- there is an Aspera_Users.README file and an aspera_transfer_guide.pdf there.
By the way, for other Mac users, Aspera put my ascp file in: /Applications/Aspera Connect.app /Contents/Resources/
It was actually inside the application file and you need to escape the space with a backslash to get the path to that. The .putty file was in the same directory.
If you type ascp without any argument you will see a small help output. To resume transfers use -k2. Also, the -l argument probably needs to be set to something you network and storage are capable of. A 7200rpm disk will probably maintain 150-200Mbps, so you could try -l200m
Yes, I was aware that typing ascp without any argument would show me all options. However, there is no information in the output to suggest that -k2 can be used to resume a download (any chance John works for Aspera?). In fact, I just tested that option and it didn't work for me at all. I have edited my initial blog post to show the options for ascp.
In regards to the bandwidth limit, I had also previously tried that but I still got the same connection error. To be clear, my speeds were not even getting close to 200M (max was 35M) so setting that option seems like it would not have any effect. Also, who are you expecting would be able to have bandwidth speeds this high to NCBI?
I do work for Aspera. To get a complete response to what your problem is (there are a lot of potential issues) you should contact support@asperasoft.com, open a ticket and let them know you are doing transfers with NCBI. They will ask for your log files, which contain metrics we can look at to find out what the problems are. If you want to isolate log files (normally written to syslog) to send to support, the -L switch can be used to specify a log directory. For example:
Your usage missed a fundamental capability in our software. The 'ascp' command line binary MUST have a -Q option in place to use the adaptive rate control. Without it, the software is using a fixed rate mode, without regard for the available bandwidth, and hence the potential for disconnecting your own transfers due to overdrive of connection. This is the sole reason for the disconnects you experienced and the fluctuating speed.
If you use the -Q option, the target transfer rate (-l) does not need to be adjusted at all. This was your primary problem.
I suppose we should make this the default in the 'ascp' command line (it is in the standalone products) but not in the command line.
Morgan, additionally, we are also releasing our 2.6 version of the software with disk-based rate control enabled. This is important for very high speed transfers where the network bottleneck speeds exceeds the disk throughput - basically extends the congestion avoidance of the adaptive rate control through the disk. At the speeds you tested, this is not important, but for other NCBI users it is.
I am taking time to reply on this because it is extremely important that users understand how to properly use the software, to realize its intended performance.
- Use Adaptive rate control and automatically adjust the transmission rate to the available bandwidth, which is around 35 Mbps REGARDLESS of the target rate (-l 200M, 200 Mbps). You were using Fixed rate control (no -Q) before, which fixed the transmission rate at 200 Mbps.
Adaptive rate control is the automatic adjustment to available bandwidth that you mentioned 'ascp' should do.
- If you interrupt the transfer and restart it, will resume the transfer from the point of interruption (-k2 flag).
The default usage is to not resume, to overwrite files at the destination. This is documented in the command line usage, 'man ascp'.
Thanks for your comments. You are the third person from your company to contact me about how to fix my problem. I am surprised that your support person didn't mentioned that I should use the -Q option?!
You suggest looking at the man page, but there is no man page documentation included with the "Aspera Connect" package. There may be a man page with the "Aspera Client", but this is not available to download by the general public (username & password is requested).
Also, the -K2 option only works if the -l option is used.
Considering the large number of users that will eventually be using the ascp command line program it would be great if many of these options were default or better explained.
As a further comment, there seems to be some limitations with downloading directories with the command line. The files start to download, but then get a permission denied error on hitting the file ".a.swp". That is fine since I don't want that file, but ascp will error and not continue onto the next file.
Also, it seems that I can't use a "*" to only get some of the files in a directory, which I can do with FTP.
Ric Mackie in our support team did in fact tell you to use the -Q flag when downloading to engage adaptive rate control.
Regarding the man page inclusion with 'ascp', we do not include the man page with the Connect client installer because it is intended to be used as a browser plug-in application. You are using the contained 'ascp' binary (which is fine) but that is typically done when a user has installed our desktop client package, Aspera client. This package (rpm or deb) includes the man page.
Regarding the permission denied errors when transferring these files, ascp uses the native file system permissions to determine if it can read or write files. Are you certain that the user account context in which the 'ascp' process is running has access to read and/or write the special files in question (on the source or destination, whichever applies)?
Regarding support of glob matches (*), great point. We are actually adding support for that in our 2.6 release series. I completely agree ascp should support it.
Regarding -k2 depending on the use of the -l option, I haven't noticed that myself, but if -l is not specified at the command line, ascp uses a default target rate of 10 Mbps (which limits the transfer rate to 10 Mbps) on top of the automatically adapted rate -- probably not what you want. You will want to include a "-l" in your command line options that is as high as you would ever want the transfer to be, e.g. -l 200M in your case.
I realize there are a number of command line options, but it is worth learning them to get the results you want. Users of our GUI client products don't have to worry about any of these -- they are built in to the application.
Then repeat the command. The transfer will pick up from where it left off. Any files previously transferred are "skipped" and the transfer resumes from within the file where it was interrupted.
Ric Mackie in our support team did in fact tell you to use the -Q flag when downloading to engage adaptive rate control.
No, he didn't. I told me to make sure I set the -L limit to the max bandwidth or just below to make sure the packets don't get backed up due to me not having enough bandwidth. No mention of -Q.
Regarding the man page inclusion with 'ascp', we do not include the man page with the Connect client installer because it is intended to be used as a browser plug-in application. You are using the contained 'ascp' binary (which is fine) but that is typically done when a user has installed our desktop client package, Aspera client. This package (rpm or deb) includes the man page.
So why not allow everyone to download this client? Would have saved me quite a bit of time.
Regarding the permission denied errors when transferring these files, ascp uses the native file system permissions to determine if it can read or write files. Are you certain that the user account context in which the 'ascp' process is running has access to read and/or write the special files in question (on the source or destination, whichever applies)?
Yes, I probably don't have permission to read that particular file. That is fine. The problem is that ascp seems to have problems continuing on afterwards
Regarding -k2 depending on the use of the -l option, I haven't noticed that myself, but if -l is not specified at the command line, ascp uses a default target rate of 10 Mbps (which limits the transfer rate to 10 Mbps) on top of the automatically adapted rate -- probably not what you want. You will want to include a "-l" in your command line options that is as high as you would ever want the transfer to be, e.g. -l 200M in your case.
Actually suggested that the setting of -l 200 M was the reason -K2 wasn't working.
I realize there are a number of command line options, but it is worth learning them to get the results you want. Users of our GUI client products don't have to worry about any of these -- they are built in to the application.
The whole point of me making this blog post was to provide information about how to use Aspera to get data from NCBI. Considering there is no documentation anywhere about the command line program, I thought the default settings would at least allow me to download a few files without problems
'ascp' does in fact resume a directory download even if no -l is specified, for example try the following:
Then repeat the command. The transfer will pick up from where it left off. Any files previously transferred are "skipped" and the transfer resumes from within the file where it was interrupted.
This did not work for me. I have attached a picture to my blog to show my problem.
Ric Mackie feels very concerned because he did tell you to use the -Q option, but the important point is simply to note the option for your usage going forward.
Regarding the main Aspera Client, it is available for use but only with a purchased license, whereas the Aspera Connect included on NCBI's site is freely distributed as part of the NCBI license. Some NCBI partners use the main Aspera Client in lieu of Connect.
The Connect web client is intended to be used a browser plug-in, not command line, and its documentation is geared accordingly: http://download.asperasoft.com/download/docs/connect/2.3/aspera-connect-linux.html
Regarding documentation of the 'ascp' command line usage, you will find complete documentation for download on our web site for all products for which the 'ascp' command line is an intended usage.
See, for example, Aspera Client ascp command line usage:
NCBI has a login and password to access this. Here is a temporary login and password you can use:
Login: tmpmarsw Password: a8mbnu88
Last year we also provided this to some of the folks at NCBI to publish as part of the FAQ. Please feel free to add this info for your users.
Regarding your problem with the download not resuming when specifying -k2, I can not see the error picture. It is possible that if you used fixed rate (no -Q) and a 200 Mbps target rate on 35 Mbps connection that the severe overdrive would cause the transfer to fail rather than complete the resume checks.
That said, by adding -Q to engage adaptive rate control, the transfer session will not overdrive and there will be no problem in resuming transfers.
Regarding the permission denied error causing ascp to not continue transferring, that is actually not the behavior at all. Please drop me your log file (/var/log/messages) and I will take a look to determine the root cause.
I was able to zoom in on the picture you provided and can see the error. This sort of "Connection lost" error shortly into the transfer precisely means that one end of the transfer was not able to receive UDP traffic on the FASP port from the other end for 10 seconds (if the session has not been fully established) or for more than 60 seconds if the session has been fully established.
This error is terminating the connection and preventing the resume check and the progress of the transfer.
There is more than one possible root cause, and can be determined from the transfer log files.
Would you please zip and upload the /var/log/messages file to demo.asperasoft.com You can use Unix scp with user 'support' and password 'demoaspera'.
Please email me once available and I will review and reply asap.
I think many of these problems could have been avoided if NCBI or Aspera posted documentation for the ascp program. If the documentation is not freely available maybe there should be a warning that it is not for command line use?
I think many users would like to use the Aspera software to speed up their downloads, but for some reason NCBI is not providing much information about it or that it even exists. Maybe NCBI is still in a "test" phase of using your software?
Anyway, thanks for your help so far, I think we are making some forward progress.
I have uploaded my /var/log/messages file to your server, in hopes of you determining my connection issues.
I am glad we were able to get to the bottom of the three issues you were facing. Hopefully they can be of help to other users:
1. A rate limiting cap in your network on UDP traffic that strictly limits the transmission rate for Aspera FASP.
2. The use of the skip special files option at the command line made it appear as if the transfer was not making progress when in fact it was actually skipping multiple special files in a row.
3. The permission denied error caused by the source file permissions being propagated to the destination files, unless you request an alternate directory mask through the ascp configuration file.
Please let us know if you have any other questions. Hopefully you will be able to share your ongoing experiences with your readers.
12TB, that is a ton of data. I have been trying to just get all of GenBank downloaded (300GB) and have been failing miserably. I have tried FTP and also using Aspera (web plugin and ascp). Aspera is faster, but I will still lose my connection at some point. The problem with resuming either FTP or Aspera is that it takes so long for the file checking, that often by the time to starts to download again I get another disconnect.
As to your specific error with Aspera, I have no idea since I don't work for them.
If you happen to have a recent download of GenBank (still compressed and untouched) please add it to BioTorrents. These disconnect issues would not be a problem if NCBI used BitTorrent to distribute their bigger datasets.
Thanks for this post, I have been having similar issues with getting data out of dbGAP. Unfortunately your post did not help address what I needed but it was informative and I did learn quite a bit about aspera which hopefully will help me in the future. More than anything, it was good to see that I am not the only person dealing with similar issues.
It is horrible that NCBI uses aspera. My tax money is wasted totally.
terrible interface, complicated command switches, no 'about' in the firefox plug-in interface. Staying in status 'connecting' forever, no explaining for any error.
It is now quite a bit after this original thread, but dealing with Aspera is still difficult.
I've been trying for about 3 weeks to get about 30TB of data from the Broad Inst to UC Irvine via Aspera using the Linux commandline ascp client. I tried convincing both the Broad and my client that they'd be better off buying 10 3TB disks and re-using them for such transfer but no go..
The recieve node is a 64core node with a single 1Gb connection to the public Internet via CENIC, thru a pipe that is relatively quiet. It should be able to carry at least 80MB/s to out 10Gb backbone which runs to CENIC. It is running at about 60MB/s, most of it via ascp, thru 5 parallel ascp connections.
Inspired by reading this blog (Thanks!) I tried a serial connection:
(this was canceled after I realized that to pick up the transfer again, it would have to read thru the 12 TB that I've already transferred. It appears to be doing something like an rsync, which may be the only way to get it to work, but that's a big chunk of disk activity..
(and this one was started after the serial one was canceled - see above) /root/.aspera/connect/bin/ascp -C 4:5 -O 33004 -l 1000m user@xxx.broadinstitute.org:/xxx /xxx/yyy/broad
Note in the last command, the option '-C 4:5 -O 33004' is identical between the last 2 commands. For some reason the obvious increment '-C 5:5 -O 33005' would not work - the ssh connection was refused. They all appear to be working (and all commands claim to be working on the same file 'MH0131639.bam', altho it's impossible to see what's happening to that file since ascp apparently defines the eventual size of the file and (at least on a gluster filesystem), it is always that size; it does not increment as data comes in:
The size of the file in the separate windows does not equal the size of the segments if that's what it is supposed to indicate: those sizes at this point are:
14GB + 44GB + 7GB5 + 101GB + 91GB (= 325GB, much larger than the total size of the file.
So, parts of it work. It has the above mentioned parallel option that I've been trying for about a week and the main problem is that (as others have noted) it keeps dropping the connection. Not only does it drop the connection, but the ascp process keeps running so a monitor script cannot tell that the process has dropped (altho I recently discovered that if you add add a few [Enter]s after the one for the password, it will drop out of the process so a monitor script can now tell that one process has stopped.)
The transfer speed seems to be about 10MBytes per connection. Not bad, but not much better than my experience with bbcp , which is both free, quite reliable, and whose author is quite responsive to suggestions. Maximum connection over all ascp connections is about 50-60 MBytes/s.
If it keeps working (and it often doesn't), it will transfer data, but I don't see much improvement over transfer rates that bbcp or better, gridFTP (and especially its consumer interface Globus Online which can move data extraordinarily fast (and with less fuss).
Why the Broad Institute and especially the NIH would use a commercial 'solution' that seems to still be in an alpha stage when there are free solutions that demonstrably work better, I'm not sure.
NCBI says aspera is free but in aspera site it is asking for subscription. Can you please direct me where I can download aspera (ascp) for free? Many thanks
Check out SuperTCP (supertcp.com). They sound like Aspera, but can operate directly on TCP, so you don't have to totally change your workflow to get faster transfers!
Hi Morgan I am a new user of Aspera, and thanks for your blog. I tried to type command "ascp -QT -i ~/asperaweb_id_dsa.putty era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/SRR345/SRR346368/SRR346368.fastq.gz" and then terminal told me that "ascp:'ascp-license' could not be found, existing. But I have copied "ascp-license" file from .aspera/connect/etc to /usr/local/bin, I don't know why PC can't find this file. If you have some suggestion, please let me know. Thanks again.
Since I had hard time to figure out or download list of files/databases using --file-list flag, I thought I can post the command here which may help some people.
if you want to download a bunch of files, write a file that lists the required databases/files with path i.e.
No globbing. No sensible logs or error messages. Still difficult to get hold of. Switches still are overly complicated and not particularly powerful. What a shame that transferring data is still a thing bioinformatics struggles with.
I am a regular reader of your blog and I find it really informative. Hope more Articles From You.Best Tableau tutorial videos available Here. hope more articles from you.
You re in point of fact a just right webmaster. The website loading speed is amazing. It kind of feels that you're doing any distinctive trick. Moreover, The contents are masterpiece. you have done a fantastic activity on this subject! tableau certification 360DigiTMG
This is a really explainable very well and i got more information from your site.Very much useful for me to understand many concepts and helped me a lot.Best data science courses in hyerabad
Nice blog and informative content. Really useful for many people, I bookmarked your website for further blogs. Thanks, you. Data Science Course in Hyderabad
Data Scientist is the top job in the market, as it has promising career growth and high salary packages. Start your preparation with the best Data Science training Institute 360DigiTMG today and become a successful Data Scientist.
50 comments:
Thanks for this. It worked well with the Aspera plug-in in Safari in OS/X, about 20X faster than ftp. I may try some experimentation with the command line version- I have some possible future applications where scripting would probably be useful.
@Cliff, Yes the speed difference is quite amazing. Let me know if you have better results with the command line program.
I just tried a 10 MB and a 2 GB download from the command line and they were both ok. I did use a slightly different command, with a bandwidth limit: -l 200M.
I found that in some NCBI documentation in the 1000genomes folder on the ftp site- there is an Aspera_Users.README file and an aspera_transfer_guide.pdf there.
By the way, for other Mac users, Aspera put my ascp file in:
/Applications/Aspera Connect.app
/Contents/Resources/
It was actually inside the application file and you need to escape the space with a backslash to get the path to that. The .putty file was in the same directory.
If you type ascp without any argument you will see a small help output. To resume transfers use -k2. Also, the -l argument probably needs to be set to something you network and storage are capable of. A 7200rpm disk will probably maintain 150-200Mbps, so you could try -l200m
Yes, I was aware that typing ascp without any argument would show me all options. However, there is no information in the output to suggest that -k2 can be used to resume a download (any chance John works for Aspera?). In fact, I just tested that option and it didn't work for me at all.
I have edited my initial blog post to show the options for ascp.
In regards to the bandwidth limit, I had also previously tried that but I still got the same connection error. To be clear, my speeds were not even getting close to 200M (max was 35M) so setting that option seems like it would not have any effect. Also, who are you expecting would be able to have bandwidth speeds this high to NCBI?
I do work for Aspera. To get a complete response to what your problem is (there are a lot of potential issues) you should contact support@asperasoft.com, open a ticket and let them know you are doing transfers with NCBI. They will ask for your log files, which contain metrics we can look at to find out what the problems are. If you want to isolate log files (normally written to syslog) to send to support, the -L switch can be used to specify a log directory. For example:
$ ascp -i -QT -l 35m -k2 -L /tmp/aspera user@host:ncbifiles /data
The /tmp/aspera directory will need to exist, but after your transfer you will see some log files.
As for the -k2, that option specifies how to deal with partial files. The support team can also look at the logs to see why this did not work.
I have some ideas what is going on, but the support team is very helpful and the log files will help us identify the exact cause.
I have made an additional update to my original blog post explaining the reasons that Aspera gave me for my connection problems.
If others have experiences with the software it would be great to hear your opinions.
Morgan,
Your usage missed a fundamental capability in our software. The 'ascp' command line binary MUST have a -Q option in place to use the adaptive rate control. Without it, the software is using a fixed rate mode, without regard for the available bandwidth, and hence the potential for disconnecting your own transfers due to overdrive of connection. This is the sole reason for the disconnects you experienced and the fluctuating speed.
If you use the -Q option, the target transfer rate (-l) does not need to be adjusted at all. This was your primary problem.
I suppose we should make this the default in the 'ascp' command line (it is in the standalone products) but not in the command line.
Morgan, additionally, we are also releasing our 2.6 version of the software with disk-based rate control enabled. This is important for very high speed transfers where the network bottleneck speeds exceeds the disk throughput - basically extends the congestion avoidance of the adaptive rate control through the disk. At the speeds you tested, this is not important, but for other NCBI users it is.
I am taking time to reply on this because it is extremely important that users understand how to properly use the software, to realize its intended performance.
Thanks,
Michelle Munson
President, Aspera, Inc.
Michelle
Morgan,
I would like to have you run the following command to demonstrate our points:
ascp -QT -l 200M -k2 -i ../etc/asperaweb_id_dsa.putty anonftp@ftp-private.ncbi.nlm.nih.gov:/source_directory /destination_directory/
This will do the following:
- Use Adaptive rate control and automatically adjust the transmission rate to the available bandwidth, which is around 35 Mbps REGARDLESS of the target rate (-l 200M, 200 Mbps). You were using Fixed rate control (no -Q) before, which fixed the transmission rate at 200 Mbps.
Adaptive rate control is the automatic adjustment to available bandwidth that you mentioned 'ascp' should do.
- If you interrupt the transfer and restart it, will resume the transfer from the point of interruption (-k2 flag).
The default usage is to not resume, to overwrite files at the destination. This is documented in the command line usage, 'man ascp'.
Thank you,
Michelle
Hi Michelle,
Thanks for your comments. You are the third person from your company to contact me about how to fix my problem. I am surprised that your support person didn't mentioned that I should use the -Q option?!
You suggest looking at the man page, but there is no man page documentation included with the "Aspera Connect" package. There may be a man page with the "Aspera Client", but this is not available to download by the general public (username & password is requested).
Also, the -K2 option only works if the -l option is used.
Considering the large number of users that will eventually be using the ascp command line program it would be great if many of these options were default or better explained.
As a further comment, there seems to be some limitations with downloading directories with the command line. The files start to download, but then get a permission denied error on hitting the file ".a.swp". That is fine since I don't want that file, but ascp will error and not continue onto the next file.
Also, it seems that I can't use a "*" to only get some of the files in a directory, which I can do with FTP.
Morgan,
Ric Mackie in our support team did in fact tell you to use the -Q flag when downloading to engage adaptive rate control.
Regarding the man page inclusion with 'ascp', we do not include the man page with the Connect client installer because it is intended to be used as a browser plug-in application. You are using the contained 'ascp' binary (which is fine) but that is typically done when a user has installed our desktop client package, Aspera client. This package (rpm or deb) includes the man page.
Regarding the permission denied errors when transferring these files, ascp uses the native file system permissions to determine if it can read or write files. Are you certain that the user account context in which the 'ascp' process is running has access to read and/or write the special files in question (on the source or destination, whichever applies)?
Regarding support of glob matches (*), great point. We are actually adding support for that in our 2.6 release series. I completely agree ascp should support it.
Regarding -k2 depending on the use of the -l option, I haven't noticed that myself, but if -l is not specified at the command line, ascp uses a default target rate of 10 Mbps (which limits the transfer rate to 10 Mbps) on top of the automatically adapted rate -- probably not what you want. You will want to include a "-l" in your command line options that is as high as you would ever want the transfer to be, e.g. -l 200M in your case.
I realize there are a number of command line options, but it is worth learning them to get the results you want. Users of our GUI client products don't have to worry about any of these -- they are built in to the application.
Thanks for all of your feedback.
Michelle
Morgan,
'ascp' does in fact resume a directory download even if no -l is specified, for example try the following:
ascp -TQ -k2 asperaweb@demo.asperasoft.com:aspera-test-dir-large .
Terminate the transfer part-way through.
Then repeat the command. The transfer will pick up from where it left off. Any files previously transferred are "skipped" and the transfer resumes from within the file where it was interrupted.
Michelle
Ric Mackie in our support team did in fact tell you to use the -Q flag when downloading to engage adaptive rate control.
No, he didn't. I told me to make sure I set the -L limit to the max bandwidth or just below to make sure the packets don't get backed up due to me not having enough bandwidth. No mention of -Q.
Regarding the man page inclusion with 'ascp', we do not include the man page with the Connect client installer because it is intended to be used as a browser plug-in application. You are using the contained 'ascp' binary (which is fine) but that is typically done when a user has installed our desktop client package, Aspera client. This package (rpm or deb) includes the man page.
So why not allow everyone to download this client? Would have saved me quite a bit of time.
Regarding the permission denied errors when transferring these files, ascp uses the native file system permissions to determine if it can read or write files. Are you certain that the user account context in which the 'ascp' process is running has access to read and/or write the special files in question (on the source or destination, whichever applies)?
Yes, I probably don't have permission to read that particular file. That is fine. The problem is that ascp seems to have problems continuing on afterwards
Regarding -k2 depending on the use of the -l option, I haven't noticed that myself, but if -l is not specified at the command line, ascp uses a default target rate of 10 Mbps (which limits the transfer rate to 10 Mbps) on top of the automatically adapted rate -- probably not what you want. You will want to include a "-l" in your command line options that is as high as you would ever want the transfer to be, e.g. -l 200M in your case.
Actually suggested that the setting of -l 200 M was the reason -K2 wasn't working.
I realize there are a number of command line options, but it is worth learning them to get the results you want. Users of our GUI client products don't have to worry about any of these -- they are built in to the application.
The whole point of me making this blog post was to provide information about how to use Aspera to get data from NCBI. Considering there is no documentation anywhere about the command line program, I thought the default settings would at least allow me to download a few files without problems
'ascp' does in fact resume a directory download even if no -l is specified, for example try the following:
ascp -TQ -k2 asperaweb@demo.asperasoft.com:aspera-test-dir-large .
Terminate the transfer part-way through.
Then repeat the command. The transfer will pick up from where it left off. Any files previously transferred are "skipped" and the transfer resumes from within the file where it was interrupted.
This did not work for me. I have attached a picture to my blog to show my problem.
Morgan,
In reply, not necessarily in order ....
Ric Mackie feels very concerned because he did tell you to use the -Q option, but the important point is simply to note the option for your usage going forward.
Regarding the main Aspera Client, it is available for use but only with a purchased license, whereas the Aspera Connect included on NCBI's site is freely distributed as part of the NCBI license. Some NCBI partners use the main Aspera Client in lieu of Connect.
The Connect web client is intended to be used a browser plug-in, not command line, and its documentation is geared accordingly: http://download.asperasoft.com/download/docs/connect/2.3/aspera-connect-linux.html
Regarding documentation of the 'ascp' command line usage, you will find complete documentation for download on our web site for all products for which the 'ascp' command line is an intended usage.
See, for example, Aspera Client ascp command line usage:
http://download.asperasoft.com/download/docs/scp_client/2.5/aspera-client-unix.html#ascp-usage
NCBI has a login and password to access this. Here is a temporary login and password you can use:
Login: tmpmarsw
Password: a8mbnu88
Last year we also provided this to some of the folks at NCBI to publish as part of the FAQ. Please feel free to add this info for your users.
Regarding your problem with the download not resuming when specifying -k2, I can not see the error picture. It is possible that if you used fixed rate (no -Q) and a 200 Mbps target rate on 35 Mbps connection that the severe overdrive would cause the transfer to fail rather than complete the resume checks.
That said, by adding -Q to engage adaptive rate control, the transfer session will not overdrive and there will be no problem in resuming transfers.
Regarding the permission denied error causing ascp to not continue transferring, that is actually not the behavior at all. Please drop me your log file (/var/log/messages) and I will take a look to determine the root cause.
Thanks,
Michelle
Morgan,
I was able to zoom in on the picture you provided and can see the error. This sort of "Connection lost" error shortly into the transfer precisely means that one end of the transfer was not able to receive UDP traffic on the FASP port from the other end for 10 seconds (if the session has not been fully established) or for more than 60 seconds if the session has been fully established.
This error is terminating the connection and preventing the resume check and the progress of the transfer.
There is more than one possible root cause, and can be determined from the transfer log files.
Would you please zip and upload the /var/log/messages file to demo.asperasoft.com You can use Unix scp with user 'support' and password 'demoaspera'.
Please email me once available and I will review and reply asap.
Thanks,
Michelle
Michelle,
I think many of these problems could have been avoided if NCBI or Aspera posted documentation for the ascp program. If the documentation is not freely available maybe there should be a warning that it is not for command line use?
I think many users would like to use the Aspera software to speed up their downloads, but for some reason NCBI is not providing much information about it or that it even exists. Maybe NCBI is still in a "test" phase of using your software?
Anyway, thanks for your help so far, I think we are making some forward progress.
I have uploaded my /var/log/messages file to your server, in hopes of you determining my connection issues.
Morgan,
I am glad we were able to get to the bottom of the three issues you were facing. Hopefully they can be of help to other users:
1. A rate limiting cap in your network on UDP traffic that strictly limits the transmission rate for Aspera FASP.
2. The use of the skip special files option at the command line made it appear as if the transfer was not making progress when in fact it was actually skipping multiple special files in a row.
3. The permission denied error caused by the source file permissions being propagated to the destination files, unless you request an alternate directory mask through the ascp configuration file.
Please let us know if you have any other questions. Hopefully you will be able to share your ongoing experiences with your readers.
Michelle
About a week ago, I started a 12 TB (many files) download from NCBI using ascp with the following options:
-QTr -l 300M -k 1
For a couple of days it was downloading at about 2 GB/min. After about 5.5 TB, however, I got timeout errors, and the process stopped.
A day later, after reading this blog, I restarted the download using the -k 2 option.
ascp immediately began deleting incomplete files so that I went from 5.5 TB down to 2.4 TB.
It doesn't seem to be adding any new files now, and I'm seeing many of the following errors in the messages log:
ERR rex_add: rex buffer full
Any help anyone could provide would be greatly appreciated.
Hi Alex,
12TB, that is a ton of data. I have been trying to just get all of GenBank downloaded (300GB) and have been failing miserably. I have tried FTP and also using Aspera (web plugin and ascp). Aspera is faster, but I will still lose my connection at some point. The problem with resuming either FTP or Aspera is that it takes so long for the file checking, that often by the time to starts to download again I get another disconnect.
As to your specific error with Aspera, I have no idea since I don't work for them.
If you happen to have a recent download of GenBank (still compressed and untouched) please add it to BioTorrents. These disconnect issues would not be a problem if NCBI used BitTorrent to distribute their bigger datasets.
Thanks for this post, I have been having similar issues with getting data out of dbGAP. Unfortunately your post did not help address what I needed but it was informative and I did learn quite a bit about aspera which hopefully will help me in the future. More than anything, it was good to see that I am not the only person dealing with similar issues.
It is horrible that NCBI uses aspera. My tax money is wasted totally.
terrible interface, complicated command switches, no 'about' in the firefox plug-in interface. Staying in status 'connecting' forever, no explaining for any error.
Learn something from wget please!!!
I just gave ascp a spin and it worked on first try. At least 20 times faster than wget, wow!
On linux installation was trivial and I'm using the command line tool as follows:
/home/www/.aspera/connect/bin/ascp -QT -l200M -i /home/www/.aspera/connect/etc/asperaweb_id_dsa.putty anonftp@ftp-private.ncbi.nlm.nih.gov:/blast/db/nr.*.tar.gz ./
Haven't tried the plug-in and probably never will.
Worked great on first try. At least 20 times faster than wget. I'm using the command line tool as follows:
/home/www/.aspera/connect/bin/ascp -QT -l200M -i /home/www/.aspera/connect/etc/asperaweb_id_dsa.putty anonftp@ftp-private.ncbi.nlm.nih.gov:/blast/db/nr.*.tar.gz ./
This really makes a difference!
This link contains vital information that will assist you to resolve all outstanding issues with ASCP and Aspera line of products.
http://www.asperasoft.com/en/company/luke_4/Luke_4
In what ways is Aspera the same as Bittorrent? What are the architectural similarities?
Hi Michelle
Great Post was getting 'Session Stop (Error: Session data transfer timeout (server), Peer Error: Session'
Applied your suggestion with the -TQ and -k2 flags
ascp -i "//home/.ssh/id_dsa" -TQ -k2 // ://
and it worked a treat no errors :-)
Thanks
Zahid Ali
(Pearson UK)
It is now quite a bit after this original thread, but dealing with Aspera is still difficult.
I've been trying for about 3 weeks to get about 30TB of data from the Broad Inst to UC Irvine via Aspera using the Linux commandline ascp client. I tried convincing both the Broad and my client that they'd be better off buying 10 3TB disks and re-using them for such transfer but no go..
The recieve node is a 64core node with a single 1Gb connection to the public Internet via CENIC, thru a pipe that is relatively quiet. It should be able to carry at least 80MB/s to out 10Gb backbone which runs to CENIC. It is running at about 60MB/s, most of it via ascp, thru 5 parallel ascp connections.
Inspired by reading this blog (Thanks!) I tried a serial connection:
/root/.aspera/connect/bin/ascp -O 33005 -QT -l 500M -k2 user@xxx.broadinstitute.org:/xxx /xxx/yyy/broad
(this was canceled after I realized that to pick up the transfer again, it would have to read thru the 12 TB that I've already transferred. It appears to be doing something like an rsync, which may be the only way to get it to work, but that's a big chunk of disk activity..
The 4 parallel transfers are:
/root/.aspera/connect/bin/ascp -C 1:5 -O 33001 -l 1000m user@xxx.broadinstitute.org:/xxx /xxx/yyy/broad
/root/.aspera/connect/bin/ascp -C 2:5 -O 33002 -l 1000m user@xxx.broadinstitute.org:/xxx /xxx/yyy/broad
/root/.aspera/connect/bin/ascp -C 3:5 -O 33003 -l 1000m user@xxx.broadinstitute.org:/xxx /xxx/yyy/broad
/root/.aspera/connect/bin/ascp -C 4:5 -O 33004 -l 1000m user@xxx.broadinstitute.org:/xxx /xxx/yyy/broad
(and this one was started after the serial one was canceled - see above)
/root/.aspera/connect/bin/ascp -C 4:5 -O 33004 -l 1000m user@xxx.broadinstitute.org:/xxx /xxx/yyy/broad
Note in the last command, the option '-C 4:5 -O 33004' is identical between the last 2 commands. For some reason the obvious increment '-C 5:5 -O 33005' would not work - the ssh connection was refused. They all appear to be working (and all commands claim to be working on the same file 'MH0131639.bam', altho it's impossible to see what's happening to that file since ascp apparently defines the eventual size of the file and (at least on a gluster filesystem), it is always that size; it does not increment as data comes in:
1101 $ ls -lh MH0131639.bam
-rw-r--r-- 1 root root 147G Apr 30 16:55 MH0131639.bam
The size of the file in the separate windows does not equal the size of the segments if that's what it is supposed to indicate: those sizes at this point are:
14GB + 44GB + 7GB5 + 101GB + 91GB (= 325GB, much larger than the total size of the file.
So, parts of it work. It has the above mentioned parallel option that I've been trying for about a week and the main problem is that (as others have noted) it keeps dropping the connection. Not only does it drop the connection, but the ascp process keeps running so a monitor script cannot tell that the process has dropped (altho I recently discovered that if you add add a few [Enter]s after the one for the password, it will drop out of the process so a monitor script can now tell that one process has stopped.)
The transfer speed seems to be about 10MBytes per connection. Not bad, but not much better than my experience with bbcp , which is both free, quite reliable, and whose author is quite responsive to suggestions. Maximum connection over all ascp connections is about 50-60 MBytes/s.
If it keeps working (and it often doesn't), it will transfer data, but I don't see much improvement over transfer rates that bbcp or better, gridFTP (and especially its consumer interface Globus Online which can move data extraordinarily fast (and with less fuss).
Why the Broad Institute and especially the NIH would use a commercial 'solution' that seems to still be in an alpha stage when there are free solutions that demonstrably work better, I'm not sure.
I have incorporated a short segment about Aspera's ascp into my "How to transfer large amounts of data via network."
Corrections happily accepted.
OK..
how about this link?
A good alternative to Aspera is a tiny open source program called xc. It is at http://github.com/speedops/xc
NCBI says aspera is free but in aspera site it is asking for subscription. Can you please direct me where I can download aspera (ascp) for free?
Many thanks
Check out SuperTCP (supertcp.com). They sound like Aspera, but can operate directly on TCP, so you don't have to totally change your workflow to get faster transfers!
Hi Morgan
I am a new user of Aspera, and thanks for your blog.
I tried to type command "ascp -QT -i ~/asperaweb_id_dsa.putty era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/SRR345/SRR346368/SRR346368.fastq.gz" and then terminal told me that "ascp:'ascp-license' could not be found, existing.
But I have copied "ascp-license" file from .aspera/connect/etc to /usr/local/bin, I don't know why PC can't find this file.
If you have some suggestion, please let me know.
Thanks again.
##DOWNLOADING BUNCH OF FILES FROM NCBI
Since I had hard time to figure out or download list of files/databases using --file-list flag, I thought I can post the command here which may help some people.
if you want to download a bunch of files, write a file that lists the required databases/files with path i.e.
/blast/db/env_nt.04.tar.gz
/blast/db/env_nt.05.tar.gz
/blast/db/env_nt.06.tar.gz
and then use the following command to download all these files using one command.
~/.aspera/connect/bin/ascp -l640M -T -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh --user=anonftp --host=ftp.ncbi.nlm.nih.gov
--mode=recv --file-list=filelist DestinationFolderwithPATH
I hope this is helpful.
Muhammad
It's 2016 and Aspera still sucks.
No globbing. No sensible logs or error messages. Still difficult to get hold of. Switches still are overly complicated and not particularly powerful. What a shame that transferring data is still a thing bioinformatics struggles with.
Thank you.. This is very helpful. .Tableau Online Training
I am a regular reader of your blog and I find it really informative. Hope more Articles From You.Best Tableau tutorial videos available Here. hope more articles from you.
You re in point of fact a just right webmaster. The website loading speed is amazing. It kind of feels that you're doing any distinctive trick. Moreover, The contents are masterpiece. you have done a fantastic activity on this subject!
tableau certification
360DigiTMG
Amazing article with very useful information thanks you sharing waiting for next update.
Data Science Training in Hyderabad 360DigiTMG
Highly recommendable blog with great resource thank you.
360DigiTMG Data Analytics Training
This is a really explainable very well and i got more information from your site.Very much useful for me to understand many concepts and helped me a lot.Best data science courses in hyerabad
"Thanks for the Information.Interesting stuff to read.Great Article.
I enjoyed reading your post, very nice share.data science training"
You need to be a part of a contest for one of the best websites online. I’m going to recommend this website!
data scientist training and placement in hyderabad
A great website with interesting and unique material what else would you need.
data scientist training in malaysia
We are really grateful for your blog post. You will find a lot of approaches after visiting your post. Great work
data scientist course
Nice blog and informative content. Really useful for many people, I bookmarked your website for further blogs. Thanks, you.
Data Science Course in Hyderabad
For a much better command line with included automatic resume and proper configuration file use:
https://github.com/IBM/aspera-cli
Data Science has specific deliverables and goals that include it. These deliverables assist in addressing the objectives of fixing the issue at hand.
Data Science in Bangalore
Data Scientist is the top job in the market, as it has promising career growth and high salary packages. Start your preparation with the best Data Science training Institute 360DigiTMG today and become a successful Data Scientist.
Data Science Training in Jodhpur
Post a Comment