, , ,

Why Computational Biology Needs Cloud Computing

Computational biology blog banner with Nuvolos logo

The growing fields of computational biology and bioinformatics prove that in modern biology and biotech, science and computation go hand in hand. As these disciplines are driven by increasing volumes of data and ever greater complexity of subject matter, computational tools are now indispensable for their scientific success. In this article, we will discuss some of the present trends in computational biology and bioinformatics and the techniques used to progress the frontier of science. We also discuss the increasing requirements placed on publications in the wake of big data computation, such as ready availability of reproducible results independent of working environment. One thing unites these trends: the opportunity that cloud computing and cloud-native platforms offer to simplify, speed up, and enhance research and collaboration in the computational domain. To understand future trends and opportunities in computational biology, it is important to understand what cloud computing can offer.

As a recent article in Nature Methods summarises very nicely, computation and biology form an ever closer partnership in modern science. Already in the 1990s and early 2000s distributed computing at home, illustrated by the millennial favourite “Folding @ Home“, has been used to study processes of protein folding and movement. This initiative pioneered the potential of distributed computational power for biomedical simulation studies. With the advent of more readily available high performance computing (HPC), especially combined with parallel computing and better software support for computational biology, techniques of this kind have really taken off. The power to run simulations in a time and cost-effective manner, obviating the need for actual supercomputing (which is generally hard to obtain access to), is now indispensable for running simulations and complex models.

Quantum computing also offers new perspectives. While it’s still in the early days and quantum computing is not necessarily better than regular HPC at standard arithmetical tasks, leading to some skepticism about the ‘quantum hype’, some researchers in computational biology are finding useful real-world applications where quantum computation can solve problems that scale exponentially on regular HPC setups. Recent studies using quantum computing methods to improve methods in complex protein design problems point to the potential of this still relatively infant technology for computational biology.

Another increasingly valued method, or cluster of methods, is machine learning. For example, so-called ‘deep learning’ – the use of artificial neural networks to make algorithmic analysis more ‘intelligent’ through imitation of the brain’s learning processes – has already proven fruitful in computational biology and bioinformatics as it has in many other fields. In particular in the medical field approaches based on deep learning have found resonance, for example for improvements in fMRI brain scanning and medical image diagnosis. A substantial increase in statistical power in the analysis of Alzheimer’s Disease using an autoencoder technique is another example from recent years. It is no secret to anyone that machine learning in general is a field with a very rapid pace of technological change and an enormous amount of resource investment in R&D, which is likely to continue paying dividends in future biological and biomedical science.

New reproducibility requirements in computational biology

Of course, as we have pointed out before, with big data comes great responsibility. This is reflected in trends in scientific publication requirements, as journals and academic institutions are coming to demand more from researchers when it comes to data availability, data sharing, and reproducibility of scientific results. With data volumes becoming ever larger and the algorithms used more complex and harder to assess, referees and colleagues need new methods for verifying the integrity and validity of scientific claims.

This trend has been going on for some time: already in 2007 the aforementioned Nature Methods, just to name an example, started requiring publication of software and underlying algorithms used in the development of new methods, including in computational biology. Since 2014, it has published extensive explicit guidelines for publishing the software and source code used in submissions to the journal, especially where those are an important component of the novel results. Similarly, a journal like PLOS Computational Biology has extensive requirements when it comes to software and code sharing for publication, and many more journals act similarly.

These requirements are not limited to the demands of editors of scientific journals, however. Practical problems faced by many scientific institutions call for collaboration on developing standards for scientific methods. In microscopy, for example, reproducibility issues has led to calls to develop norms about both hardware and software: researchers should make available essential contextual information on anything from lens illumination to file systems, code, and metadata in their microscopic analyses. Similarly in bioinformatics the field has developed elaborate standards for the formatting and publication of computational data for everything from the model encoding to project metadata, as part of a pan-European effort to aid standardisation.

COMBINE standards and related efforts

The promise of cloud computing

So how does cloud computing help with all this? From the viewpoint of research in computational biology, cloud computing can be seen as the natural heir to the promise of distributed computing power illustrated by the early example of getting individuals to do protein folding on their home computers. But instead of relying on the limited capabilities of home PCs, it derives its strength from the vast arrays of high capacity data centres and server fleets supported by some of the largest technology companies.

That this has already been paying off in research terms is not news to anyone. To stick with the topic of protein folding, a great recent example is the stunning success of the AlphaFold project, especially AlphaFold2, in that very field. Developed by the team at DeepMind, the technology relies on the great power of Google’s huge computing clusters and their proprietary tensor processing units. The result was the possibility of providing a proteomic database mapping and fairly accurate prediction success covering 98.5% of the human proteome.

Of course, not everyone can get access to the resources the DeepMind team had just so. But even on a more everyday scale, cloud computing can prove valuable for research in computational biology and bioinformatics. Pipelines for data acquisition, storage, and analysis in biomedical research routinely rely on cloud computing to enable the kind of large scale data processing required to make headway. Big “omics” data analysis, such as the Cancer Genome Atlas project, relies on cloud computing technology to allow individual researchers or small teams access to large databases and to be able to perform meaningful computation on them. Software as a service (SaaS) and platform as a service (PaaS) providers are essential to enable researchers at any institution to make use of the vast quantities of data now becoming digitally available and, most importantly, to be able to do computational experimentation and investigation in a reasonably fast and cost-effective manner.

A bioinformatics “omics” architecture – courtesy of the authors

But the benefits of cloud computing are not limited to the research itself. It can also prove its worth for meeting the challenges of the new requirements in the sharing and distributing of research artefacts, such as algorithms and software. The use of a cloud-based platform, for example, can make it easier to do effective version control on data, prevent project chaos, and meet reproducibility requirements. The right PaaS offer can provide means to sharing data, code, and even whole application states in a quick and easy way as well as saving time and effort on configuration of working environments, making reproducibility and productivity work together in computational fields. Fortunately, such a platform is available: Nuvolos, the integrated cloud-native platform for collaborative computational workspaces.

Solving computational challenges with Nuvolos

Nuvolos is a platform as a service made by computational scientists for computational scientists, specifically designed to harness the power of cloud computing to advance scientific fields like computational biology. It boosts research by providing unlimited scalable computing, and storage resources, meaning your research project can scale as needed to meet any HPC challenge. Moreover, Nuvolos offers a wide-ranging suite of best-in-class, ready-for-use tools. No more need to waste time on the configuration and overhead of maintaining libraries, dependencies, and virtual machines for your favourite computational analysis tools, as Nuvolos allows you to run them directly in the browser. In short, with Nuvolos you can get your code, data, and experiments up and running in a matter of minutes, saving substantial costs in time and effort on the usual wrestling with hardware and software maintenance. What’s more, Nuvolos can support any application that can run on Linux, so that even specialised computational biology tools are no problem.

But Nuvolos also enables collaboration and reproducibility for large or small teams in a single workspace. Using the platform, you can make data, code and applications shareable both within a team and to any number of outside referees or colleagues with just a few mouse-clicks. Smooth invite-based team onboarding and in-built version control methodologies also support the needs of researchers confronted with the new standards in collaboration and publishing, as discussed above.

On top of that, Nuvolos raises the bar for research platforms even higher by not just supporting research, collaboration, and reproducibility, but the academic education workflow as well. Nuvolos has features tailor-made for setting up courses, inviting students, uploading and distributing coursework (including supporting a video library for remote lectures), and even running applications in the students’ working environments so you can help them with their work and troubleshoot problems. Because students can directly run applications in the browser from any device, they can be guaranteed to work from the same working environment, avoiding issues of incompatibility and discrepancies between libraries, operating systems, and so forth. You can even mark coursework and set deadlines directly within the platform.

Nuvolos is an illustration of the power cloud computing technology combined with a PaaS offering designed for the needs of computational scientists can provide. Intrigued? Give our free trial a run today.