Tagged: large-scale data management



  • In 2003 the US first discovered the genome and became the preeminent nation in genomics
  • This could change
  • World power and influence have moved East
  • China has invested heavily in genomic technologies and established itself as a significant competitive force in precision medicine
  • Ownership of intellectual property and knowhow is key to driving national wealth 

The global competition to translate genomic data into personal medical therapies



Professor Dame Sally Davies, England’s Chief Medical Officer, is right. (Genomics) “has the potential to change medicine forever. . . . The age of precision medicine is now, and the NHS must act fast to keep its place at the forefront of global science.”
It is doubtful whether the UK will be able to maintain its place as a global frontrunner in genomics and personalized medicine. It is even doubtful whether the US, the first nation to discover the genome, and which became preeminent in genomic research, will be able to maintain its position. China, with its well-funded strategy to become the world’s leader in genomics and targeted therapies, is likely to usurp the UK and the US in the next decade.
This Commentary is in 2 parts. Part 1 provides a brief description of the global scientific competition between nation states to turn genomic data into medical benefits. China’s rise, which is described, could have significant implications for the future ownership of medical innovations, data protection, and bio-security. Part 2, which follows in 2 weeks, describes some of the ethical, privacy, human capital and economic challenges associated with transforming genomic data into effective personal therapies.
Turning genomic data into medical benefits
Turning genomic data into medical benefits is very demanding. It requires a committed government willing and able to spend billions, a deep understanding of the relationship between genes and physiological traits, next generation sequencing technologies, artificial intelligence (AI) systems to identify patterns in petabytes (1 petabyte is equivalent to 1m gigabytes) of complex data, world-class bio-informaticians, who are in short supply; comprehensive and sophisticated bio depositories, a living bio bank, a secure data center, digitization synthesis and editing platforms, and petabytes of both genomic, clinical, and personal data. Before describing how the UK, US and China are endeavoring to transform genomic data into personal medicine, let us refresh our understanding of genomics.

Genomics, the Human Genomic Project and epigenetics
It is widely understood that your genes are responsible for passing specific features or diseases from one generation to the next via DNA, and genetics is the study of the way this is done. However, it is less widely known that your genes are influenced by environmental and other factors. Scientists have demonstrated that inherited genes are not static, and lifestyles and environmental factors can precipitate a chemical reaction within your body that could permanently alter the way your genes react. This environmentally triggered gene expression, or epigenetic imprint, can be bad, such as a disease; or good, such as a tolerant predisposition. Epigenetics is still developing as an area of research, but it has demonstrated that preventing and managing disease is as much to do with lifestyles and the environment, as it is to do with inherited genes and drugs. If environmental exposure can trigger a chemical change in your genes that results in the onset of disease, then scientists might be able to pharmacologically manipulate the same mechanisms in order to reverse the disease.
DNA is constantly subject to mutations, which can lead to missing or malformed proteins, and that can lead to disease. You all start your lives with some mutations, which are inherited from your parents, and are called germ-line mutations. However, you can also acquire mutations during your lifetime. Some happen during cell division, when DNA gets duplicated, other mutations are caused when environmental factors including, UV radiation, chemicals, and viruses damage DNA.

You have a complete set of genes in almost every healthy cell in your body. One set of all these genes, (plus the DNA between them), is called a genome. The genome is the collection of 20,000 genes, including 3.2bn letters of DNA, which make up an individual. We all share about 99.8% of the genome. The secrets of your individuality, and also of the diseases you are prone to, lie in the other 0.2%, which is about 3 or 4m letters of DNA. The genome is known as ‘the blueprint’ of life’, and genomics is the study of the whole genome, and how it works. Whole genome sequencing (WGS) is the process of determining the complete DNA sequence of an organism's genome at a point in time.
‘The Human Genome Project’ officially began in 1990 as an international research effort to determine a complete and accurate sequence of the 3bn DNA base pairs, which make up the human genome, and to find all of the estimated 20 to 25,000 human genes. The project was completed in April 2003. This first sequencing of the human genome took 13 years and cost some US$3bn. Today, it takes a couple of days to sequence a genome, and costs range from US$260 for targeted sequencing to some US$4,000 for WGS. Despite the rapidly improving capacity to read, sequence and edit the information contained in the human genome, we still do not understand most of the genome’s functions and how they impact our physiology and health.

Roger Kornberg explains the importance of genomics
Roger Kornberg, Professor of Structural Biology at Stanford University, and 2006 Nobel Laureate for Chemistry, explains the significance of sequencing the human genome, “The determination of the human genome sequence and the associated activity called genomics; and the purposes for which they may be put for medical uses, takes several forms. The knowledge of the sequence enables us to identify every component of the body responsible for all of the processes of life. In particular, to identify any component that is either defective or whose activity we may adjust to address a problem or a condition. So the human genome sequence makes available to us the entire array of potential targets for drug development. . . . . The second way in which the sequence and the associated science of genomics play an important role is in regard to individual variations. Not every human genome sequence is the same. There is a wide variation, which in the first instance is manifest in our different appearances and capabilities. But it goes far deeper because it is also reflected in our different responses to invasion by microorganisms, to the development of cancer and to our susceptibility to disease in general. It will ultimately be possible, by analyzing individual genome sequences to construct a profile of such susceptibilities for every individual, a profile of the response to pharmaceuticals for every individual, and thus to tailor medicines to the needs of individuals.” See video below.
UK’s endeavors to transform genomic data into personal therapies

In 2013 the UK government set up Genomics England, a company charged with sequencing 100,000 whole genomes by 2017. In 2014, the government announced a £78m deal with Illumina, a US sequencing company, to provide Genomics England with next generation whole genome sequencing services. At the same time the Wellcome Trust invested £27m in a state-of-the-art sequencing hub to enable Genomics England to become part of the Wellcome Trust’s Genome Campus in Hinxton, near Cambridge, England. In 2015, the UK government pledged £215m to Genomics England.
DNA testing and cancer
DNA sequencing is simply the process of reading the code that is in any organism . . . It’s essentially a technology that allows us to extract DNA from a cell, or many cells, pass it through a sophisticated machine and read out the sequence for that organism or individual,” says David Bowtell, Professor and Head of the Cancer Genomics and Genetics Program at the Peter MacCallum Cancer Centre, Melbourne, Australia; see video below. “DNA testing has becomeincreasingly widespread because advances in technology have made the opportunity to sequence the DNA of individuals affordable and rapid  . . . DNA testing in the context of cancer can be useful to identify a genetic risk of cancer, and to help clinicians make therapeutic decisions for someone who has cancer,” says Bowtell, see video below.

What is DNA sequencing?

What are the advanteges of a person having a DNA test?

Need for National Genome Board
Despite significant investments by the UK government, Professor Davies, England’s Chief Medical Officer, complained in her 2017 Annual Report that genomic testing in the UK is like a “cottage industry” and recommended setting up a new National Genome Board tasked with making whole genome sequencing (WGS) standard practice in the NHS across cancer care, as well as some other areas of medicine, within the next 5 years.
USA’s endeavors to transform genomic data into personal therapies

In early 2015 President Obama announced plans to launch a $215m public-private precision medicine initiative, which involved the health records and DNA of 1m people, to leverage advances in genomics with the intention of accelerating biomedical discoveries in the hope of yielding more personalized medical treatments for patients. A White House spokesperson described this as “a game changer that holds the potential to revolutionize how we approach health in the US and around the world.

Data management challenges
The American plan did not seek to create a single bio-bank, but instead chose a distributive approach that combines data from over 200 large on-going health studies, which together involves some 2m people. The ability of computer systems or software to exchange and make use of information stored in such diverse medical records, and numerous gene databases presents a significant challenge for the US plan. According to Bowtell, “Data sharing is widespread in an ethically appropriate way between research institutions and clinical groups. The main obstacles to more effective sharing of information are the very substantial informatics challenges. Often health systems have their own particular ways of coding information, which are not cross compatible between different jurisdictions. Hospitals are limited in their ability to capture information because it takes time and effort. Often information that could be useful to researchers, and ultimately to patients, is lost, just because the data are not being systematically collected.” See video below.
China’s endeavors to transform genomic data into personal therapies

In 2016, the Chinese government launched a US$9bn-15-year endeavor aimed at turning China into a global scientific leader by harnessing computing and AI technologies for interpreting genomic and health data.  This positions China to eclipse similar UK and US initiatives.

Virtuous circle
Transforming genomic data to medical therapies is more than a numbers race. Chinese scientists are gaining access to ever growing amounts of human genomic data, and developing the machine-learning capabilities required to transform these data into sophisticated diagnostics and therapeutics, which are expected to drive the economy of the future.  The more genomic data a nation has the better its potential clinical outcomes. The better a nation’s clinical outcomes the more data a nation can collect. The more data a nation collects the more talent a nation attracts. The more talent a nation attracts the better its clinical outcomes.

The Beijing Genomics Institute
In 2010 China became the global leader in DNA sequencing because of one company: the Beijing Genomics Institute (BGI), which was created in 1999 as a non-governmental independent research institute, then affiliated to the Chinese Academy of Sciences, in order to participate in the Human Genome Project as China's representative. In 2010, BGI received US$1.5bn from the China Development Bank, and established branches in the US and Europe. In 2011 BGI employed 4,000 scientists and technicians. While BGI has had a chequered history, today it is one of the world’s most comprehensive and sophisticated bio depositories.

The China National GeneBank
In 2016 BGI-Shenzhen established the China National GeneBank (CNGB) on a 47,500sq.m site. This is the first national gene bank to integrate a large-scale bio-repository and a genomic database, with a goal of enabling breakthroughs in human health research. The gene-bank is supported by BGI’s high-throughput sequencing and bio-informatics capacity, and will not only provide a repository for biological collection, but more importantly, it is expected to develop a novel platform to further understand genomic mechanisms of life. During the first phase of its development the CNGB will have saved more than 10m bio-samples, and have storage capacity for 20 petabytes (20m gigabytes) of data, which are expected to increase to 500 petabytes in the second phase of its development. The CNGB represents the new generation of a genetic resource repository, bioinformatics database, knowledge database and a tool library, “to systematically store, read, understand, write, and apply genetic data,” says Mei Yonghong, its Director.

Whole-genome sequencing for $100
The CNGB could also help to bring down the cost of genomic sequencing. It is currently possible to sequence an individual's entire genome for under US$1,000, but the CNGB aims to reduce the price to US$152. Meanwhile, researchers at Complete Genomicsa US company acquired by BGI in 2013, which has developed and commercialized a DNA sequencing platform for human genome sequencing and analysis, are pushing the technology further to enable whole-genome sequencing for US$100 per sample. China's share of the world's sequencing-capacity is estimated to be between 20% and 30%, which is lower than when BGI was in its heyday, but expected to increase fast. “Sequencing capacity is rising rapidly everywhere, but it's rising more rapidly in China than anywhere else,” says Richard Daly, CEO, DNAnexus, a US company, which supplies cloud platforms for large-scale genomics data.

The intersection of genomics and AI
Making sense of 1m human genomes is a major challenge, says Professor Jian Wang, former BGI President and co-founder, who has started another company called iCarbonX. Also based in Shenzhen, the company is at the intersection of genomics and AI. iCarbonX has raised more than US$600m, and plans to collect genomic data from more than 1m people, and complement these data with other biological information including changes in levels of proteins and metabolites. This is expected to allow iCarbonX to develop a new digital ecosystem, comprised of billions of connections between huge amounts of individuals’ biological, medical, behavioural and psychological data in order to understand how their genes interact and mutate, how diseases and aging manifest themselves in cells over time, how everyday lifestyle choices affect morbidity, and how these personal susceptibilities play a role in a wide range of treatments.

iCarbonX is expected to gather data from brain imaging, biosensors, and smart toilets, which will allow real-time monitoring of urine and faeces. The Company’s goal is to be able to study the evolution of our genome as we age and design personalized health predictions such as susceptibilities to diseases and tailored treatment options. iCarbonX’s endeavours are expected to dwarf efforts by other US Internet giants at the intersection of genomics and AI.

Ethical challenges

China’s single-minded objective to turn its knowhow and experience of genome sequencing into personal targeted medical therapies has made it a significant global competitive force in life sciences. However, precision medicine’s potential to revolutionize advances in how we treat diseases confers on it moral and ethical obligations. For personal therapies to be effective, it is important that genomic data are complemented with clinical and other personal data. This combination of data is as personal as personal information gets. There could be potential harm to the tested individual and family if genomic information from testing is misused. Reconciling therapy and privacy is important, because privacy issues concerning patients' genomic data can slow or derail the progression of novel personal therapies to prevent and manage intractable diseases. The stakes are high in terms of biosecurity, as genomic research is both therapeutic and a strategic element of national security. While it is crucial to leverage genomic data for future health, economic and biodefense capital, these data will also have to be appropriately managed and protected. Part 2 of this Commentary dives into these challenges a little deeper, and describes some of China’s competitive advantages in the race to become the world’s preeminent nation in genomics and precision medicine. 

Despite the endeavours of the UK and US to remain at the forefront of the international competition to transform genomic data into personalized medical therapies for some of the worlds most common and intractable diseases, it seems reasonable to assume that China is on the cusp of becoming the most dominant nation in novel personalized treatments. Notwithstanding, China’s determination to assume the global frontrunner position in genomic science might have blunted its concerns for some of the ethical issues, which surround the life sciences. To the extent that this might be the case the future of humanity might well differ significantly from the generally accepted western vision. 
view in full page
joined 3 years, 3 months ago

Yike Guo

Professor of Computing Science

Yike Guo is a Professor of Computing Science in the Department of Computing at Imperial College London. He leads the Discovery Science Group in the department, as well as being the founding Director of the Data Science Institute at Imperial College.

Professor Guo also holds the position of CTO of the tranSMART Foundation, a global open source community using and developing data sharing and analytics technology for translational medicine.

Professor Guo received a first-class honours degree in Computing Science from Tsinghua University, China, in 1985 and received his PhD in Computational Logic from Imperial College in 1993 under the supervision of Professor John Darlington.

He founded InforSense, a healthcare intelligence company, and served as CEO for several years before the company's merger with IDBS, a global advanced R&D software provider, in 2009. He has been working on technology and platforms for scientific data analysis since the mid-1990s, where his research focuses on knowledge discovery, data mining and large-scale data management.

He has contributed to numerous major research projects including: the UK EPSRC platform project, Discovery Net; the Wellcome Trust-funded Biological Atlas of Insulin Resistance (BAIR); and the European Commission U-BIOPRED project. He is currently the Principal Investigator of the European Innovative Medicines Initiative (IMI) eTRIKS project, a €23M project that is building a cloud-based informatics platform, in which tranSMART is a core component for clinico-genomic medical research, and co-Investigator of Digital City Exchange, a £5.9M research programme exploring ways to digitally link utilities and services within smart cities.

Professor Guo has published over 200 articles, papers and reports. Projects he has contributed to have been internationally recognised, including winning the “Most Innovative Data Intensive Application Award” at the Supercomputing 2002 conference for Discovery Net, and the Bio-IT World "Best Practices Award" for U-BIOPRED in 2014. He is a Senior Member of the IEEE and is a Fellow of the British Computer Society.

view this profile