The vast majority of the DNA letters in peoples’ genomes is identical, but a small fraction of those letters varies. Genomic variation refers to DNA sequence differences among individuals or populations. Some variants influence biological function (such as a mutation that causes a genetic disease), while others have no biological effects. Get the facts.
Pangenome Worldwide
Image by NHGRI Image Gallery/Credit: Darryl Leja, NHGRI
What Is Genomic Variation?
Pangenome Worldwide
Image by NHGRI Image Gallery/Credit: Darryl Leja, NHGRI
Pangenome Worldwide
The new pangenome reference is a collection of different genomes from which to compare an individual genome sequence. Like a map of the subway system, the pangenome graph has many possible routes for a sequence to take, represented by the different colors.
Image by NHGRI Image Gallery/Credit: Darryl Leja, NHGRI
What Is Human Genomic Variation?
The vast majority of the DNA letters in peoples’ genomes is identical, but a small fraction of those letters varies. This genomic variation accounts for some of the differences among people, including important aspects of their health and susceptibility to diseases.
The Big Picture
Genomic variation reflects the differences in a person’s DNA compared to other peoples’ DNA.
There are multiple types of variants in human genomes, ranging from small differences to large differences.
A very small subset of genomic variants contributes to human health and disease.
Researchers create reference human genome sequences to help detect genomic variants in each sequenced human genome.
On average, a person’s genome sequence is ~99.6% identical to a reference human genome sequence; that person’s set of genomic variants accounts for the ~0.4% difference.
The human pangenome is a more comprehensive framework that aims to account for genomic variation across human populations, thereby reducing biases that can come with the use of a single reference human genome sequence.
Source: National Human Genome Research Institute (NHGRI)
What Is Human Genome?
Human Genome Reference Sequence
Image by NHGRI Image Gallery; Credit: Ernesto del Aguila III, NHGRI.
Human Genome Reference Sequence
New grants totaling approximately $29.5 million will enable scientists from two centers to generate and maintain the most comprehensive reference sequence of the human genome. The two centers will work with international collaborators and develop a multi-genome reference sequence that is as universal and complete as possible. Known as a ‘pan-genome,’ the more-complete reference sequence will represent 350 genomes from the human population. Over time, researchers hope that the pan-genome will reflect all human diversity, enabling analyses of any human DNA sequence.
Image by NHGRI Image Gallery; Credit: Ernesto del Aguila III, NHGRI.
What Is the Human Genome?
One copy of the human genome contains about 3 billion nucleotides, which are distributed among 23 chromosomes. Most human cells have two copies of the human genome, with one copy inherited from each parent. Cells containing two copies of each chromosome are called “diploid.” Most mammals are diploid, but some organisms have either one set or more than two sets of each chromosome.
Source: National Human Genome Research Institute (NHGRI)
How Does Genome Vary?
Mouse model for Down Syndrome
Image by NHGRI Image Gallery/Credit: Darryl Leja, NHGRI.
Mouse model for Down Syndrome
Humans and mice have very similar genomes, but the chromosomes that make up those genomes do not precisely align across those two species. For example, many of the genes found on human chromosome 21 are found on mouse chromosomes 16 and 17. A new NIH study investigates an enhanced mouse model of Down syndrome with an extra mini-chromosome, which contains over a hundred genes from mouse chromosome 16 attached to the centromere region of mouse chromosome 17.
Image by NHGRI Image Gallery/Credit: Darryl Leja, NHGRI.
How Do Peoples’ Genomes Vary?
Peoples’ genomes are far more similar to each other than they are different. It is frequently stated that any two peoples’ genomes are ~99.9% identical to one another. This percentage is based on the finding that, on average, a single-nucleotide difference exists between two peoples’ genomes once every 1,300 nucleotides or so.
However, this is an over-simplification because it only accounts for single-nucleotide differences. In reality, any two peoples’ genomes are, on average, ~99.6% identical and ~0.4% different. The latter percentage reflects both single-nucleotide differences and differences that involve multiple nucleotides.
The differences among human genomes are called genomic variants. A person’s set of genomic variants is part of what makes them unique. Other factors (such as diet, environment, lifestyle and social context) also contribute to a person’s uniqueness. Most genomic variants have no influence on the functioning of a person’s genome, but a small subset of variants do have an impact. For example, some genomic variants influence physical characteristics, like eye color and height; others influence health conditions or how a person responds to certain medications.
Source: National Human Genome Research Institute (NHGRI)
What Are the Different Variants?
Example changes to the genome.Copy number variants
Image by NHS National Genetics and Genomics Education Centre
Example changes to the genome.Copy number variants
Example changes to the genome.
Image by NHS National Genetics and Genomics Education Centre
What Are the Different Types of Genomic Variants?
There are multiple types of genomic variants.
The smallest genomic variants are single-nucleotide variants (SNVs). Each SNV reflects a difference in a single nucleotide (or letter). For a given SNV, the DNA letter at that genomic position might be a C in one person but a T in another person. SNVs are the most common type of genomic variation. A subtype of SNVs is called a single-nucleotide polymorphism (SNP; pronounced “snip”). To be considered a SNP, a SNV must be present in at least 1% of the human population. As such, SNV is a more general term that includes both relatively common (such as SNPs) and rare single-nucleotide differences. For simplicity, we refer to all single-nucleotide differences as SNVs, regardless of their relative frequency.
Another group of small genomic variants are insertions and deletions (often referred to as “indels”). Insertion/deletion variants reflect extra or missing DNA nucleotides in the genome, respectively, and typically involve fewer than 50 nucleotides. Insertion/deletion variants are less frequent than SNVs but can sometimes have a larger impact on health and disease (e.g., by disrupting the function of a gene that encodes an important protein).
One of the most common types of insertion/deletion variants are tandem repeats (also known as microsatellites). Tandem repeats are short stretches of nucleotides that are repeated multiple times and are highly variable among people. Different chromosomes can vary in the number of times such short nucleotide stretches are repeated, ranging from a few times to hundreds of times. Historically, tandem repeats have been used for building maps of the human genome and DNA profiling in forensics applications.
Genomic variation also extends beyond small stretches of nucleotides to larger chromosomal regions. These large-scale genomic differences are called structural variants and involve at least 50 nucleotides and as many as thousands of nucleotides that have been inserted, deleted, inverted or moved from one part of the genome to another. Tandem repeats that contain more than 50 nucleotides are considered structural variants; in fact, such large tandem repeats account for nearly half of the structural variants present in human genomes. When a structural variant reflects differences in the total number of nucleotides involved, it is called a copy-number variant (CNV). Note that CNVs are distinguished from other structural variants, such as inversions and translocations, because the latter types often do not involve a difference in the total number of nucleotides.
Source: National Human Genome Research Institute (NHGRI)
Additional Materials (1)
Single Nucleotide Polymorphisms (SNPs)
Single nucleotide polymorphisms (SNPs) are a type of polymorphism involving variation of a single base pair.
Image by National Human Genome Research Institute (NHGRI)
Single Nucleotide Polymorphisms (SNPs)
National Human Genome Research Institute (NHGRI)
How Are Genomic Variants Detected?
Human Chromosomes
Image by NCI Center for Cancer Research / Thomas Ried
Human Chromosomes
Normal human chromosomes visualized by spectral karyotyping (SKY).
This image was originally submitted as part of the 2015 NCI Cancer Close Up project. This image is part of the NCI Cancer Close Up 2015 collection.
See also https://visualsonline.cancer.gov/closeup.
Image by NCI Center for Cancer Research / Thomas Ried
How Are Genomic Variants Detected?
Researchers use different approaches to detect genomic variants.
Some methods are designed to only detect known genomic variants. For example, it might be important to know if a person has inherited a particular genomic variant that is relevant to their health or healthcare; for that, researchers can perform a specific DNA test to determine whether a person has that genomic variant. With other methods, researchers can analyze a person’s DNA for the presence of a large number of known genomic variants; for example, a type of test called a microarray can detect hundreds of thousands of SNVs at once.
A more comprehensive way to detect genomic variants, including those that might not yet be known, is to perform genome sequencing. Multiple methods are now available for sequencing human genomes, each only requiring a small sample of blood, hair or cheek cells from which DNA is isolated. Today, sequencing a human genome usually costs less than $1,000, which is over a million-fold less expensive than a couple of decades ago.
When a person’s genome is sequenced, both copies of each pair of chromosomes (one from each parent) are sequenced at the same time. In most routine situations (e.g., when a genome is sequenced for a medical purpose), it is not readily possible to determine the parent of origin for each detected genomic variant. This is because a human genome is not sequenced one chromosome at a time; rather, the process involves breaking up the entire genome and then piecing back together small stretches of the sequenced DNA, like a jigsaw puzzle. Without additional analyses, that process does not provide information about the parental origin of each genomic variant. A “reference” human genome sequence is also critical to sequencing a human genome and detecting genomic variants.
Source: National Human Genome Research Institute (NHGRI)
What Is Reference Genome Sequence?
Original Reference Genome Sequence
Image by NHGRI Image Gallery/Credit: Darryl Leja, NHGRI
Original Reference Genome Sequence
Around 70% of the reference human genome sequence came from one person who was then identified with the label African American in the mid 1990s. Comparison to other individuals has revealed African, European, Admixed American, East Asian and South Asian ancestry, which shows the limits of such labels. The remaining 30% came from 19 individuals of mostly European ancestry.
Image by NHGRI Image Gallery/Credit: Darryl Leja, NHGRI
What Is a Reference Genome Sequence?
Detecting variants in a person’s genome sequence usually requires an existing sequence for comparison — a “reference.” A reference human genome sequence is an established, high-quality and well-accepted sequence of a human genome (i.e., one sequence for each of the 23 human chromosomes). A reference human genome sequence is not an actual sequence from one individual but pieced together from multiple people. The important feature of a reference human genome sequence is that it depicts one assigned nucleotide for every position across the human genome. In this regard, a reference human genome sequence only represents the sequence of one copy of each chromosome, whereas a person’s genome sequence contains the sequences of both copies of each chromosome (i.e., people are diploid).
Of course, the human population contains an immense amount of variation among genomes. A given reference human genome sequence is not intended to capture or represent that diversity; rather, it serves as an important data-analysis tool. Specifically, each newly generated genome sequence from a person is directly compared to a given reference human genome sequence, allowing for the detection of all differences (variants) between the person’s genome and the reference genome. In short, comparison of a person’s genome sequence to a reference genome sequence allows for the cataloguing of all genomic variants in that person’s genome.
Source: National Human Genome Research Institute (NHGRI)
How Many Genomic Variants Are Known?
Genomic Data
Image by Ernesto Del Aguila III, NHGRI
Genomic Data
Schematic of data servers for genomic data.
Image by Ernesto Del Aguila III, NHGRI
What Is the Inventory of Genomic Variants in a Typical Human Genome Sequence?
What does a typical human genome sequence look like with respect to the variants that it contains? On average, compared to a reference human genome, a person’s ~6 billion-nucleotide genome sequence will have:
~5,000,000 SNVs that involve ~5,000,000 nucleotides
~600,000 insertion/deletion variants that involve ~2,000,000 nucleotides
~25,000 structural variants that involve >20,000,000 nucleotides.
These numbers represent current estimated averages and will likely change as more human genomes are sequenced to completion. Furthermore, multiple approaches can be used to calculate how the different types of genomic variants account for the differences among genomes. In fact, calculating the total number of nucleotides involved in structural variants in each human genome is quite complicated and remains an active area of research.
That means that, on average, the complete set of genomic variants in each person’s genome involves ~27,000,000 nucleotides (among the ~6,000,000,000 nucleotides in their genome). Those ~27,000,000 nucleotides reflect some type of difference at those positions in the DNA, together accounting for ~0.4% of the person’s complete genome. In other words, when accounting for the full inventory of genomic variants, a typical person’s genome sequence is ~99.6 identical to (or ~0.4% different from) a reference human genome sequence (or even another person’s genome sequence).
Source: National Human Genome Research Institute (NHGRI)
What Is Pangenome Reference Sequence?
Infographic: Why do we need a new pangenome reference?
Image by NHGRI Image Gallery/Credit: Julia Fekecs, NHGRI.
Infographic: Why do we need a new pangenome reference?
The original human genome reference sequence was generated by the Human Genome Project in 2003. While this reference sequence has been regularly updated as researchers fixed errors and filled in missing regions of the genome, it only reflected data generated from about 20 people. Most of that first human genome reference sequence was just from one person.
Image by NHGRI Image Gallery/Credit: Julia Fekecs, NHGRI.
What Is a Pangenome Reference Sequence and How Is It Used?
Reference human genome sequences are invaluable for detecting genomic variants in each newly sequenced human genome. But the currently available reference human genome sequences do not accurately represent the genomic diversity of the human population, and that lack of diversity can introduce biases when analyzing some peoples’ genome sequences.
To address this deficiency, researchers are working to generate a more complete set of reference human genome sequences that better reflect all of humanity. A “pangenome” is the collective genome sequences of multiple individuals that better represents the genomic diversity of the species. The human pangenome reference sequence will provide a better tool for comparing genome sequences from people all over the world in an effort to detect and characterize genomic variants more completely, including those with important roles in health and disease.
Source: National Human Genome Research Institute (NHGRI)
Additional Materials (1)
Pangenome Tube Map
The new pangenome reference is a collection of different genomes from which to compare an individual genome sequence. Like a map of the subway system, the pangenome graph has many possible routes for a sequence to take, represented by the different colors.
The detouring paths at the top of the image represent single nucleotide variants (SNVs), which are single letter differences. The yellow path that loops around itself and repeats the same nucleotides represents a duplication variant. The pink path that loops counterclockwise and follows the nucleotide sequence backwards represents an inversion variant. At the bottom, the green and dark blue paths miss the C nucleotide in its route and represent a deletion variant. The light blue path, which has extra nucleotides in its route, represents an insertion variant.
Image by NHGRI Image Gallery/Credit: Darryl Leja, NHGRI
Pangenome Tube Map
NHGRI Image Gallery/Credit: Darryl Leja, NHGRI
Why Does Genome Variation Matter?
HLA-A Gene from Pangenome tiled longer
Image by NHGRI Image Gallery/Credit: Adam M. Novak, Ph.D., University of California Santa Cruz
HLA-A Gene from Pangenome tiled longer
A Sequence Tube Map rendering of the highly variable HLA-A gene on chromosome 6. The boxes with DNA letters on them are pieces of DNA that might occur somewhere in a copy of the HLA-A gene in a person's genome. Each colored line connects up the pieces of DNA in order and spells out a particular version of the gene. Thicker lines represent more common versions of the gene. Where the lines run together, different versions of the HLA-A gene match, and where the lines diverge, they differ. This particular gene is much more diverse and variable than most other genes are, because of its role in the immune system. This view is from the Human Pangenome Reference Consortium's Minigraph-Cactus pangenome graph version 1.0.
Image by NHGRI Image Gallery/Credit: Adam M. Novak, Ph.D., University of California Santa Cruz
Why Does Genome Variation Matter?
Genomic variation drives evolution and serves to expand biodiversity. This is true of humans as well as for plants, animals and other organisms. Such diversity keeps populations healthy and is fundamental to natural selection, an evolutionary process by which organisms adapt to changing environments.
Human genomic variation is also highly relevant in the field of medicine. Only a small fraction of genomic variants affects human health. In some cases, genomic variants directly cause diseases (such as in cystic fibrosis and sickle cell disease). In other cases, the effects of genomic variants are more subtle (such as in hypertension and diabetes, where a genomic variant might contribute to the overall risk that a person might have of the condition). Healthcare professionals are increasingly learning how to use information about patients’ genomic variants to manage their medical care — something known as genomic medicine.
Looking forward
Genomics is constantly evolving and advancing, including the regular development of new experimental technologies and data-analysis methods. Such advances are readily applicable to ongoing efforts to develop new and more inclusive sets of reference human genome sequences and improve the ability to detect all variants in each newly sequenced human genome. The long-term goal is to have sufficient knowledge about genomic variation in all human populations, bringing equity in the benefits of genomic medicine.
Source: National Human Genome Research Institute (NHGRI)
Send this HealthJournal to your friends or across your social medias.
Human Genomic Variation
The vast majority of the DNA letters in peoples’ genomes is identical, but a small fraction of those letters varies. Genomic variation refers to DNA sequence differences among individuals or populations. Some variants influence biological function (such as a mutation that causes a genetic disease), while others have no biological effects. Get the facts.