13.0: Classification of Cancers - Biology

13.0:  Classification of Cancers - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Cancers can be classified based on the tissues in which they originate. Sarcomas are cancers that originate in mesoderm tissues, such as bone or muscle, and cancers arising in glandular tissues (e.g. breast, prostate) are classified as adenocarcinomas. Carcinomas originate in epithelial cells (both inside the body and on its surface) and are the most common types of cancer (~85%). Each of these classifications may be further sub-­‐divided. For example, squamous cell carcinoma (SCC), basal cell carcinoma (BCC), and melanoma are all types of skin cancers originating respectively in the squamous cells, basal cells, or melanocytes of the skin.

13.0: Classification of Cancers - Biology

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited.

Feature Papers represent the most advanced research with significant potential for high impact in the field. Feature Papers are submitted upon individual invitation or recommendation by the scientific editors and undergo peer review prior to publication.

The Feature Paper can be either an original research article, a substantial novel research study that often involves several techniques or approaches, or a comprehensive review paper with concise and precise updates on the latest progress in the field that systematically reviews the most exciting advances in scientific literature. This type of paper provides an outlook on future directions of research or possible applications.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to authors, or important in this field. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

MacMahon B, Pugh TF,Epidemiology — Principles and Methods. Boston: Little, Brown and Co., 1970.

Kuhn TS.The Structure of Scientific Revolutions. 2nd Ed. Chicago: University of Chicago Press, 1962.

Baker GP, and Hacker PMS.Wittgenstein — Meaning and Understanding. Chicago: University of Chicago Press, 1980:185–208.

World Health Organization.International Classification of Diseases for Oncology. Geneva: WHO, 1976.

Berg JW. Morphologic classification of human cancer. In: Schottenfeld D, Fraumeni JF Jr, eds.Cancer Epidemiology and Prevention. Philadelphia: W.B. Saunders Co., 1982:74–89.

Pardee AB. Principles of cancer biology: biochemistry and cell biology. In: De Vita VT Jr, Hellman S, Rosenberg SA, eds.Cancer Principles and Practice of Oncology, 2nd Edition. New York: Lippincott, 1985:3–17.

Muller J.On the Nature and Structural Characteristics of Cancer. London: Sherwood, Gilbert and Piper, 1840.

Anderson MW, Maronpot RR, Reynolds SH. Role of oncogenes in chemical carcinogenesis: extrapolation from rodents to humans. In: Bartsch H, Hemminki K, O'Neill IK, eds.Methods for Detecting DNA Damaging Agents in Humans. IARC Scie. Publ. No. 89, Lyon: IARC, 1989.

Vahakangas KH, Samet JM, Metcalf RA, Welsh JA, Bennett JP, Lane DP, Harris CC. Mutations of p53 and ras genes in radon-associated lung cancer from uranium miners.Lancet 1992339:576–580.

Sporn MB. Carcinogenesis and cancer: different perspectives on the same disease.Cancer Res. 199151:6215–6218.

Vogelstein B, Fearon ER, Hamilton SH, et al. Genetic alterations during colorectaltumor development.N. Engl. J. Med. 1988319:525–532.

Blood cancer in a nutshell: Classification and investigation

September marks International Blood Cancer Awareness Month, an annual campaign to raise awareness and support of blood cancers. In celebration of this and to help increase general knowledge and understanding of such diseases, we have gone back to basics with an overview on the classification and laboratory investigation of blood cancers.

Blood cancers are a diverse group of disorders which occur when there is neoplastic proliferation of a malignant blood cell in the haemopoietic system. The clinical presentation of blood cancers is variable and highly dependent on the site and severity of the disease.

Blood cancer classification

Blood cancers are classified into myeloid (related to bone marrow) or lymphoid (related to tissue that produces lymphocytes and antibodies) disorders based on the haemopoietic lineage in which an abnormality occurred they can also further be categorised into one of the following groups:

Leukaemia is a cancer where malignant haemopoietic cells are found within the bone marrow and peripheral blood. Leukaemia can be classified into myeloid or lymphoid, and acute or chronic based on the rate of onset of the disease. In acute leukaemia, there is abnormal proliferation of blasts (poorly differentiated immature cells) and maturational arrest can occur. In contrast, chronic leukaemia is not associated with blasts, since malignant cells continue to mature during haemopoiesis. There are four main categories of leukaemia: AML (acute myeloid leukaemia), ALL (acute lymphoid leukaemia), CML (chronic myeloid leukaemia) and CLL (chronic lymphoid leukaemia).

Lymphomas affect the lymphoid lineage and are chronic malignancies. Malignant lymphoid cells generally accumulate in, and are restricted to, lymphoid organs causing a lymphomatous static tumour. If the affected lymphoid organ becomes so overwhelmed and accumulated with malignant cells, the cells can force infiltration into organs outside of the lymphoid tissue, such as peripheral blood and bone marrow. Lymphomas are broadly categorized into Hodgkin lymphoma (HL) and non-Hodgkin lymphoma (NHL) based on whether the malignant tumour contains Reed-Sternberg cells (abnormal lymphocytes only associated with HL).

Myelodysplastic syndromes are myeloid malignancies in which immature precursor myeloid cells accumulate in the bone marrow. As a consequence, the death of these immature cells occurs before they can mature into effector cells. This causes ineffective haemopoiesis and consequently a reduction in the red cells, white cells and platelets.

Multiple myeloma is a lymphoid malignancy in which there is abnormal proliferation of malignant plasma cells. In health, plasma cells are required for antibody production in multiple myeloma, malignant plasma cells over-synthesise and secrete excessive quantities of monoclonal immunoglobulins called paraproteins. In excess concentrations in the plasma, paraproteins can cause severe tissue damage to a number of organs.

Myeloproliferative disorders are myeloid abnormalities in which there is abnormal overproduction of myeloid cells in the bone marrow, most commonly causing erythrocytosis, thrombocytosis, neutrophilia and basophilia. The main myeloproliferative disorders are CML, polycythaemia vera, myelofibrosis and essential thrombocythaemia.

Investigation of blood cancers

The role of a pathology laboratory is fundamental in investigating blood cancer. It is important to accurately diagnose malignancies since treatment strategies vary enormously and failure to correctly diagnose a patient could have a major impact on their prognosis. There are many different tests and analytical techniques that can be employed in the investigation of a suspected blood cancer, including the following:

A full blood count (FBC) provides data on red cell, white cell and platelet indices. These parameters are essential since they detail the number of each cell line and help determine bone marrow involvement. In some cases, the FBC results are one of the first indications of a suspected blood cancer and it can be as a result of FBC analysis that further investigations are requested.

Morphological examination of peripheral blood and/or bone marrow helps identify the shape, size, characteristics, cellular inclusions and maturity of the blood cells. Particular morphological features are distinctly associated with certain types of blood cancer and therefore analysis of morphology is pivotal in aiding a diagnosis or indicating further investigations.

Immunophenotyping analyzes the expression of antigenic markers expressed on cellular surfaces these markers are called clusters of differentiation (CD) markers. Specific markers associated with each cell lineage allow cellular populations to be identified. In health, certain CD markers should be present, however in malignancy there can be the loss of expression, over-expression and the aberrant expression of specific markers which should not be present in a healthy individual. By comparing results to the normal expected expression, differentiation can be made between healthy cell and malignant cell populations.

Cytogenetics studies cells at the molecular level since some blood cancers are caused by a chromosomal mutation. By analyzing a patient’s karotype and chromosomal structure, genetic mutations can be identified.

Histological examination of a biopsy (for example a lymph node), can reveal tissue architecture including the type of cells present. This helps to determine the presence and/or spread of a malignancy.

Due to the heterogeneous nature of blood cancers, a multidisciplinary laboratory approach is often necessary for diagnosis. Hence, to aid a diagnosis, all laboratory results must be interpreted in conjunction to the clinical presentation.

Innovations and advancements are rapidly progressing in the field of blood cancers with the aim of ultimately helping to reduce the prevalence of such diseases and improve the prognosis for the individuals affected.

Click here to read the latest articles published in Blood Cancer Journal.

By Body Part/System

Cancers are also often separated by the organs or organ systems in which they arise.

Central Nervous System Cancers

Central nervous system cancers include those that originate in tissues of either the brain or the spinal cord. Cancers that spread to the brain are not considered brain cancers, but rather brain metastases, and are far more common than primary brain cancers.  

Cancers that commonly spread to the brain include lung cancer, breast cancer, and melanoma. Unlike tumors in other regions of the body, brain cancers do not often spread outside of the brain.

Overall, the incidence of brain cancer has been increasing in recent years.  

Head and Neck Cancers

Head and neck cancers can affect any region of the head and neck, from the tongue to the vocal cords. In the past, these cancers were most commonly seen in people who were both heavy drinkers and smokers.   In recent years, however, human papillomavirus (HPV) has become an important cause of these cancers, with close to 10,000 people developing HPV-related head and neck cancers each year in the United States alone.

    : Roughly 60-70% of all head and neck cancers are oral cancers.   These cancers may involve the mouth, tongue, tonsils, throat (pharynx), and the nasal passageways. (cancer of the vocal cords)

Breast Cancers

Many people are aware that breast cancer is an all-too-common cancer in women, but it's important to point out that men get breast cancer also. Approximately 1 in 100 breast cancers occur in men.   The most common type of breast cancer is ductal carcinoma.

Since most breast cancers are carcinomas, they may sometimes be detected before they have become invasive. This is considered carcinoma in situ, or stage 0 breast cancer. Breast cancer stages 1 through 4 are invasive stages of the disease. You may hear these more specific names:

    and lobular carcinoma in situ (LCIS): Carcinoma in situ is the earliest stage at which breast cancer can be detected and is considered stage 0. These cancers have not yet penetrated through the basement membrane and are considered non-invasive. They are most often detected when a biopsy is done for an abnormality on a screening mammogram.   (both ductal and lobular): Once a breast cancer penetrates through the basement membrane, it is considered invasive. : Inflammatory breast cancer, in contrast to other breast cancers, does not usually present as a lump. Rather, the early stages of the disease look like a redness and rash on the breast. : When breast cancer occurs in men, it is more likely that there is a genetic component. A family history of breast cancer should prompt a discussion with your doctor.

It can be frightening to hear that you have an "invasive" cancer, but this does not mean that your cancer has spread. Even stage 1 is referred to in this way based on the appearance of the tumor under a microscope.

Respiratory Cancers

Cancers of the lung and bronchial tubes are the leading cause of cancer deaths in both men and women in the United States.   While smoking is a risk factor for these diseases, lung cancer occurs in never-smokers as well. In fact, lung cancer in these individuals is the sixth leading cause of cancer deaths in the United States.

Lung cancer is decreasing overall, likely related to a decrease in smoking. But it is increasing in young adults, especially young, never-smoking women. The reason is not understood at this time. Types you may hear about include:

    : Subtypes of non-small cell lung cancer (responsible for around 80-85% of lung cancers) include lung adenocarcinoma, squamous cell carcinoma of the lungs, and large cell lung cancer.   : Small cell lung cancer accounts for around 15% of lung cancers and is more likely to occur in people who have smoked.   : Mesothelioma is a cancer of the pleural mesothelium, the lining surrounding the lungs. It is strongly linked with exposure to asbestos.  

Digestive System Cancers

Digestive tract cancers may occur anywhere from the mouth to the anus. Most of these cancers are adenocarcinomas, with squamous cell carcinomas occurring in the upper esophagus and most distant portion of the anus. Types include:

    : The most common form of esophageal cancer has changed in recent years. Whereas squamous cell esophageal cancer (often related to smoking and drinking) was once the most common form of the disease, it has been surpassed by esophageal adenocarcinoma (often related to long-standing acid reflux).   : Stomach cancer is uncommon in the United States, but is a common type of cancer worldwide. : Pancreatic cancer is less common than some other cancers, but is the fourth most common cause of cancer-related deaths in both men and women.   It is most often diagnosed in the later stages of the disease, when surgery is unfortunately no longer possible. : Cancer metastatic to the liver is much more common than primary liver cancer. Risk factors for liver cancer include alcohol abuse and chronic infections with hepatitis B or C.   : Colon cancer is often referred to as colorectal cancer and includes both cancers of the rectum and the upper colon. It is the third leading cause of cancer deaths in both men and women.  
  • Anal cancer: Anal cancer differs from colon cancer both in treatments and causes. Infection with HPV now causes the majority of anal cancers.  

Urinary System Cancers

The genitourinary system involves the kidneys, the bladder, the tubes connecting the kidneys and bladder (called the ureters), and the urethra (the passageway out from the bladder). This system also includes structures such as the prostate gland. Types include:

    : The most common types of kidney cancer include renal cell carcinoma (around 90% of cases),   transitional cell carcinoma, and Wilms' tumor in children. : Roughly half of bladder cancers are caused by tobacco exposure.   Those who work with dyes and paints are also at higher risk. : Prostate is the second leading cause of cancer death in men, but now has a very high five-year survival rate.  

Reproductive System Cancers

Reproductive organ cancers may occur in men and women. Ovarian cancer is the fifth most common cause of cancer deaths in women, and though curable in the early stages, is often diagnosed when it has already spread. Types include:

Endocrine Cancers

The endocrine system is a series of glands that produce hormones and, as such, may have symptoms of an over- or underproduction of these hormones. Most endocrine cancers, with the exception of thyroid cancer, are fairly rare. A combination of different endocrine cancers may run in families and is referred to as multiple endocrine neoplasia, or MEN.  

The incidence of thyroid cancer is increasing in the United States more than any other cancer. Thankfully, the survival rate for many of these cancers is high.

Bone and Soft Tissue Cancers

In contrast to primary bone and soft tissue cancers, which are uncommon, cancer that is metastatic to bone is common. Bone cancer, either primary or metastatic, often presents with symptoms of pain or of a pathologic fracture—a fracture that occurs in a bone that is weakened by the presence of tumor.   Types include:

Blood-Related Cancers

Blood-related cancers include both those involving blood cells and those involving solid tissue of the immune system, such as lymph nodes. The risk factors for blood-related cancers differ somewhat from solid cancers in that environmental exposures as well viruses (such as the Epstein-Barr virus, which causes mononucleosis) play a significant role. These are the most common cancers in children.  

Blood-related cancers include:

Skin Cancers

Skin cancers are often separated into two primary groups: melanoma and non-melanoma. While non-melanoma skin cancers are much more common, melanomas are responsible for most skin cancer deaths.  

Examples of skin cancers include:

Essay on Tumor Suppressor Genes | Cancer | Diseases | Biology

Read this essay to examine the nature of tumor suppressor genes and the ways in which their loss can lead to cancer. Also learn about the roles played by all the types of gene mutations, along with non-mutational changes, in converting normal cells into cancer cells.

1. Essay on Tumor Suppressor Genes: (Around 4000 Words)

Roles in Cell Proliferation and Cell Death:

By definition, tumor suppressors are genes whose loss or inactivation can lead to cancer, a condition characterized by increased cell proliferation and decreased cell death. It is therefore logical to suspect that the normal function of a tumor suppressor gene would be the opposite—namely, to inhibit cell proliferation or promote cell death—and so the loss of such functions would cause increased cell proliferation or decreased cell death.

i. Cell Fusion Experiments Provided the First Evidence for the Existence of Tumor Suppressor Genes:

The first indication that cells might contain genes whose loss is associated with the development of cancer came from experiments using a technique called cell fusion. In 1960, a research team in Paris headed by Georges Barski discovered that cells of two different types grown in culture will occasionally fuse together to form hybrid cells containing the chromosomes of both original cell types.

Shortly thereafter Henry Harris reported that cell fusion can be artificially induced by treating cells with inactivated forms of a particular type of virus called Sendai virus. Treatment with the virus causes the plasma membranes of two cells to fuse with each other, creating a combined cell in which the nuclei of the two original cells share the same cytoplasm.

When the cell subsequently divides, the two separate nuclei break down and a single new nucleus is formed that contains chromosomes derived from both of the original cells. Such a cell, containing a nucleus with chromosomes derived from two different cells, is called a hybrid cell.

Experiments in which cancer cells were fused with normal cells provided some important early insights into the genetic basis for the abnormal behavior of cancer cells. Based on our current understanding of oncogenes, you might expect that the hybrid cells created by fusing cancer cells with normal cells would have acquired oncogenes from the original cancer cell and would therefore exhibit uncontrolled proliferation, just like a cancer cell.

In fact, that is not what usually happens the fusion of cancer cells with normal cells almost always yields hybrid cells that initially behave like the normal parent and do not form tumors (Figure 1). Such results, first reported in the late 1960s, pro­vided the earliest evidence that normal cells contain genes that can suppress tumor growth and reestablish normal controls on cell proliferation.

Although fusing cancer cells with normal cells gener­ally yields hybrid cells that lack the ability to form tumors, it does not mean that these cells are normal.

When they are allowed to grow for extended periods in culture, the hybrid cells often revert back to the malig­nant, uncontrolled behavior of the original cancer cells. Reversion to malignant behavior is associated with the loss of certain chromosomes, suggesting that these particular chromosomes contain genes that had been suppressing the ability to form tumors. Such observa­tions eventually led to the naming of the lost genes as “tumor suppressor genes.”

As long as hybrid cells retain both sets of original chromosomes—that is, chromosomes derived from both the cancer cells and the normal cells—the ability to form tumors is suppressed. Tumor suppression is even observed when the original cancer cells possess an oncogene, such as a mutant RAS gene, that is actively expressed in the hybrid cells.

This means that tumor suppressor genes located in the chromosomes of normal cells are able to overcome the effects of a RAS oncogene present in a cancer cell chromosome. The ability to form tumors only reappears after the hybrid cell loses a chromosome containing a critical tumor suppressor gene.

ii. Studies of Inherited Chromosomal Defects and Loss of Heterozygosity have Led to the Identification of Several Dozen Tumor Suppressor Genes:

Although cell fusion experiments provided early evidence for the existence of tumor suppressor genes, identifying these genes did not turn out to be a simple task. By definition, the existence of a tumor suppressor gene only becomes evident after its function has been lost. How do scientists go about identifying something whose very existence is unknown until it disappears?

One approach is based on the fact that defects in tumor suppressors are responsible for several hereditary cancer syndromes. Members of cancer-prone families often inherit a defective tumor suppressor gene from one parent, thereby elevating their cancer risk because a single mutation in the other copy of that tumor suppressor gene can then lead to cancer.

Microscopic examination of cells obtained from individuals in such families sometimes reveals the existence of gross chromosomal defects. For example, certain individuals with familial retinoblastoma exhibit a deleted segment in a specific region of one copy of chromosome 13, not just in cancer cells but in all cells of the body.

To determine whether a tumor suppressor is located in the region that has undergone deletion, scientists have simply examined retinoblastoma cells to see which gene has become mutated in the comparable region of the second copy of chromosome 13.

The loss of tumor suppressor genes is not restricted to hereditary cancers. These genes may also be lost or inacti­vated through random mutations that strike a particular target tissue, leading to the mutation or loss of both copies of the same gene.

You might think that the most straight­forward way for that to happen would be through two independent mutations randomly occurring in sequence. However, the mutation rate for any given gene is about one in a million per cell division, so the chance of two independent mutations affecting two copies of the same gene is extremely remote.

After a single copy of a tumor suppressor gene has undergone mutation, a more efficient approach for disrupting the remaining normal copy is through a phe­nomenon known as loss of heterozygosity, so named because the initial state, in which one abnormal and one normal gene copy are present, is called the heterozygous state.

Getting rid of the remaining normal copy therefore causes the heterozygous state to be lost. Loss of heterozygosity is more common than you might expect whereas individual gene mutations arise at a rate of one in a million per gene per cell division, loss of heterozygosity is as frequent as once in a thousand cell divisions and tends to affect large regions of DNA encompassing hundreds of different genes.

Figure 2 illustrates several ways in which loss of het­erozygosity may arise. In one mechanism, called mitotic nondisjunction, the two duplicated copies of a given chro­mosome fail to separate (disjoin) at the time of mitosis, so both copies go to one daughter cell and the other daughter cell receives no copies.

As seen in Figure 2b, the latter cell will no longer be heterozygous for any genes contained on the missing chromosome. A second mechanism involves mitotic recombination, in which homologous chromosomes exchange DNA sequences when they line up during the process of mitosis. Figure 2c shows how such an exchange could lead to loss of heterozygosity.

A third mechanism, called gene conversion, occurs when the DNA molecules from two homologous chromosomes line up next to each other and copy base sequence information from one to the other.

In this way, a DNA region that was originally present in two different versions in the two members of a homolo­gous pair of chromosomes can be made identical by copying DNA sequence information from one chromo­some to the other chromosome (Figure 2d).

The existence of the preceding mechanisms means that if a cell happens to acquire a random mutation that inactivates one copy of a tumor suppressor gene, loss of heterozygosity might either replace the normal copy with the defective version or remove it entirely. Loss of heterozygosity usually affects hundreds of neighboring genes simultaneously, making it relatively easy to detect.

You simply analyze a large number of known genes, searching for those that are present in two different versions in the normal cells of a cancer patient but are present in only one version in the same person’s cancer cells. When genes exhibiting this behavior are detected, it is likely that they lie near a tumor suppressor gene whose loss of heterozygosity is actually responsible for the cancerous growth.

Geneticists have performed thousands of searches looking for chromosomal regions that exhibit loss of heterozygosity in cancer cells. This approach, along with the study of chromosomal defects associated with hereditary cancer syndromes, has led to the identification of several dozen tumor suppressor genes.

iii. The RB Tumor Suppressor Gene Produces a Protein that Restrains Passage through the Restriction Point:

The first tumor suppressor gene to be isolated and characterized was the RB gene. The protein produced by the RB gene, called the Rb protein (or simply Rb), restrains cell proliferation in the absence of growth factors. The Rb protein normally exerts this action by halting the cell cycle at the restriction point.

In cells that have been exposed to an appropriate growth factor, however, signaling pathways trigger the production of Cdk-cyclin complexes that catalyze the phosphorylation of Rb. Phosphorylated Rb can no longer exert its inhibitory effects and so the cells are free to pass through the restriction point and into S phase.

The molecular mechanism by which Rb exerts this control over the restriction point is summarized in Figure 3. Prior to phosphorylation, Rb binds to the E2F transcription factor, a protein that (in the absence of bound Rb) activates the transcription of genes coding for enzymes and other proteins required for initiating DNA replication.

As long as the Rb protein remains bound to E2F, the E2F molecule is inactive and these genes stay silent, thereby preventing cells from entering into S phase. However, in a cell that has been stimulated to divide (e.g., by the addition of growth factors), the activation of growth signaling pathways leads to the production of Cdk-cyclin complexes that catalyze the phosphorylation of Rb. Phosphorylation abolishes the ability of Rb to bind to E2F, thus allowing E2F to activate the transcription of genes whose products are required for entry into S phase.

Because the normal purpose of Rb is to halt the cell cycle in the absence of growth factors, RB mutations that lead to the loss or inactivation of the Rb protein remove this restraining influence on the cell cycle and lead to excessive proliferation. Such mutations leading to a loss of Rb function are observed in some hereditary as well as environmentally caused forms of cancer. Certain cancer viruses also disrupt Rb function.

For example, the human papillomavirus (HPV), has an oncogene that codes for the E7 on co-protein, which binds to Rb. When bound to E7, the Rb protein cannot perform its normal function of restraining passage through the restriction point and cell proliferation therefore proceeds unchecked, even in the absence of growth factors. Cancers triggered by a loss of Rb func­tion can thus arise in two fundamentally different ways- through mutations that delete or disrupt both copies of the RB gene and through the action of viral oncopro­teins that bind to and inactivate the Rb protein.

The p53 Tumor Suppressor Gene Produces a Protein that Prevents Cells with Damaged DNA from Proliferating:

Since the discovery of the RB gene in the mid-1980s, dozens of additional tumor suppressor genes have been identified (Table 1). One of the most important is the p53 gene (also called TP53 in humans), which produces the p53 protein. The p53 gene is mutated in a broad spectrum of different tumor types, and almost half of the close to the ten million people diagnosed worldwide with cancer each year will have p53 mutations, making it the most commonly mutated gene in human cancers (Figure 4).

The p53 protein is sometimes called the “guardian of the genome” because of the central role that it plays in protecting cells from the effects of DNA damage. Figure 5 illustrates how this function is performed.

When cells are exposed to DNA-damaging agents, such as ion­izing radiation or toxic chemicals, the damaged DNA triggers the activation of an enzyme called ATM kinase, which catalyzes the phosphorylation of p53 and several other target proteins. Phosphorylation of p53 by the ATM kinase prevents it from interacting with Mdm2, a protein that would otherwise mark p53 for destruction by linking it to a small protein called ubiquitin.

Mdm2 is one of numerous proteins in the cell, called ubiquitin ligases that attach ubiquitin molecules to a specific set of proteins. As shown in Figure 6, the normal function of ubiquitin is to direct molecules to the proteasome, the cell’s main protein destruction machine.

After p53 has been phos­phorylated by ATM in response to DNA damage, the Mdm2 ubiquitin ligase can no longer attach ubiquitin chains to p53. As a result, the p53 protein accumulates in cells containing damaged DNA rather than being degraded by the ubiquitin-mediated proteasome pathway.

The accumulating p53 in turn activates two types of events- cell cycle arrest and cell death. Both responses are based on the ability of p53 to act as a transcription factor that binds to DNA and activates specific genes. Among the targeted genes is the gene coding for the p21 protein, a member of a class of molecules called Cdk inhibitors because they block the activity of Cdk-cyclin complexes.

The p21 protein inhibits the Cdk-cyclin complex that would normally phosphorylate Rb, thereby halting the cell cycle at the restriction point and providing time for the DNA damage to be repaired. At the same time, p53 also activates the production of DNA repair enzymes.

If the damage cannot be successfully corrected, p53 then acti­vates genes that produce proteins involved in triggering cell death by apoptosis. A key protein in this pathway, called Puma (“p53 up-regulated modulator of apoptosis”), promotes apoptosis by binding to and inactivating the Bcl2 protein, a normally occurring inhibitor of apoptosis.

By triggering cell cycle arrest or cell death in response to DNA damage, the p53 protein prevents genetically altered cells from proliferating and passing the damage on to future cell generations. Mutations that disrupt p53 function therefore increase cancer risk because they permit cells with damaged DNA to survive and reproduce.

For example, individuals who inherit a mutant p53 gene from one parent have an elevated risk of developing cancer because they only require one additional mutation to inactivate the second copy of the gene. This high-risk hereditary condition is called the Li-Fraumeni syndrome.

Most p53 mutations, however, are not inherited they are caused by exposure to DNA-damaging chemicals and radiation. To cite but two examples, carcinogenic chemicals in tobacco smoke have been found to trigger point mutations in the p53 gene of lung cells, and the ultraviolet radiation in sunlight has been shown to cause p53 mutations in skin cells.

When exposure to carcinogenic chemicals or radia­tion creates mutations in the p53 gene, you might expect that both copies of the gene would need to be inactivated before functional p53 protein would be lost. In some cases, however, mutation of one copy of the p53 gene may be sufficient to disrupt the p53 protein, even when the other copy of the gene is normal.

The apparent explanation is that the p53 molecule is constructed from four protein chains bound together to form a tetramer. As shown in Figure 7, the presence of even one mutant chain in such a tetramer can be enough to prevent the p53 protein from functioning normally. When a muta­tion in one copy of the p53 gene causes the p53 protein to be inactivated in this way, even in the presence of a normal copy of the gene, it is called a dominant negative mutation.

Mutating the p53 gene is not the only mechanism for disrupting p53 function the p53 protein can also be targeted directly by certain viruses. For example, human papillomavirus—whose E7 oncoprotein inactivates the Rb protein—produces another molecule, called the E6 oncoprotein, which binds to and targets the p53 protein for destruction.

The ability of human papillomavirus to cause cancer is therefore linked to its capacity to block the action of proteins produced by both the RB and p53 tumor suppressor genes.

The APC Tumor Suppressor Gene Codes for a Protein that Inhibits the Wnt Signaling Pathway:

The next tumor suppressor to be discussed is, like the p53 gene, a frequent target for cancer-causing mutations in this case, however, cancers arise mainly in one organ, namely the colon. The gene in question, called the APC gene, is the tumor suppressor. Individuals with this con­dition inherit a defective APC gene that causes thousands of polyps to grow in the colon and imparts a nearly 100% risk of developing colon cancer for individ­uals who live to the age of 60.

Although familial adenomatous polyposis is quite rare, accounting for less than 1% of all colon cancers, APC mutations are also associated with the more common forms of colon cancer that arise in people with no family history of the disease. In fact, recent studies suggest that roughly two-thirds of all colon cancers involve APC mutations.

The APC gene codes for a protein involved in the Wnt pathway, a signaling mechanism that plays a prominent role in activating cell proliferation during embryonic devel­opment. As shown in Figure 8, the central component of the Wnt pathway is a protein called β-catenin. Normally, β-catenin is prevented from functioning by a multi-protein destruction complex that consists of the APC protein com­bined with the proteins axin and glycogen synthase kinase 3 (GSK3).

When assembled in such an APC-axin-GSK3 complex, GSK3 catalyzes the phosphorylation of β- catenin. The phosphorylated β-catenin then becomes a target for a ubiquitin ligase that attaches it to ubiquitin, thereby marking the phosphorylated β-catenin for degra­dation by proteasomes. The net result is a low concentration of β-catenin, which makes the Wnt pathway inactive.

The Wnt pathway is turned on by signaling molecules called Wnt proteins, which bind to and activate cell surface Wnt receptors. The activated receptors stimulate a group of proteins that inhibit the axin-APC-GSK3 destruction complex and thereby prevent the degradation of β-catenin. The accumulating β-catenin then enters the nucleus and interacts with transcription factors that activate a variety of genes, including some that stimulate cell proliferation.

Mutations causing abnormal activation of the Wnt pathway have been detected in numerous cancers. Most of them are loss-of-function mutations in the APC gene that are either inherited or, more commonly, triggered by environmental carcinogens. The resulting absence of func­tional APC protein prevents the axin-APC-GSK3 complex from assembling and β-catenin therefore accumulates, locking the Wnt pathway in the on position and sending the cell a persistent signal to divide.

iv. The PTEN Tumor Suppressor Gene Codes for a Protein that Inhibits the PI3K-Akt Signaling Pathway:

Cell proliferation is controlled through an interconnected network of pathways with numerous branches and shared components. A good example is provided by growth factors that activate the Ras-MAPK pathway. When a growth factor binds to a receptor that activates Ras-MAPK signaling, the receptor usually activates several other path­ways at the same time.

One of these additional pathways, called the PI3K-Akt pathway, involves an enzyme called phosphatidylinositol 3-kinase (abbreviated as PI 3-kinase or PI3K). As shown in Figure 9, PI 3-kinase undergoes activation when it binds to phosphorylated tyrosines found in receptors that have been stimulated by growth factor binding.

A similar mech­anism is involved in triggering the Ras-MAPK pathway. PI 3-kinase then catalyzes the addition of a phosphate group to a plasma membrane lipid called PIP2 (phosphatidylinositol-4, 5-bisphosphate), which converts PIP2 into PIP3 (phosphatidylinositol-3, 4, 5-trisphosphate).

PIP3 in turn recruits protein kinases to the inner surface of the plasma membrane, leading to phosphorylation and activation of a protein kinase called Akt. Through its ability to catalyze the phosphorylation of several key target proteins, Akt suppresses apoptosis and inhibits cell cycle arrest. The net effect of the PI3K-Akt signaling pathway is therefore to promote cell survival and proliferation.

Dysfunctions in PI3K-Akt signaling have been detected in a number of different cancers. For example, AKT gene amplification occurs in some ovarian and pan­creatic cancers, and a v-akt oncogene coding for a mutant Akt protein is present in an animal retrovirus that causes thymus cancers in mice. In such cases, excessive produc­tion or activity of the Akt protein leads to hyperactivity of the PI3K-Akt pathway and hence an enhancement of cell proliferation and survival.

Conversely, inhibitors of PI3K-Akt signaling can function as tumor suppressors. A prominent example is PTEN, an enzyme that removes a phosphate group from PIP3 and thus abolishes its ability to activate Akt. In cells that are not being stimulated by growth factors, the intra­cellular concentration of PIP3 is kept low by the action of PTEN and the PI3K-Akt pathway is therefore inactive.

When loss-of-function mutations disrupt the ability to produce PTEN, the cell cannot degrade PIP3 efficiently and its concentration rises. The accumulating PIP3 in turn activates Akt, thereby leading to enhanced cell prolifera­tion and survival (even in the absence of growth factors). Mutations that reduce PTEN activity are found in up to 50% of prostate cancers and glioblastomas, 35% of uterine endometrial cancers, and to varying extents in ovarian, breast, liver, lung, kidney, thyroid, and lymphoid cancers.

v. Some Tumor Suppressor Genes Code for Components of the TGFβ-Smad Signaling Pathway:

Growth factors are usually thought of as being molecules that stimulate cell proliferation, but some growth factors have the opposite effect: They inhibit cell proliferation. An example is transforming growth factor β (TGFβ), a protein that may either stimulate or inhibit cell prolifera­tion, depending on the cell type and context. TGFβ is especially relevant for tumor development because it is a potent inhibitor of epithelial cell proliferation, and roughly 90% of human cancers are carcinomas—that is, cancers of epithelial origin.

TGFβ exerts its inhibitory effects on cell prolifera­tion through the TGFβ-Smad pathway illustrated in Figure 10. The first step in this pathway is the binding of TGFβ to a cell surface receptor. Like many other growth factor receptors, the receptors for TGFβ catalyze protein phosphorylation reactions, although in this case the amino acids serine and threonine rather than tyrosine are phosphorylated.

TGFβ binds to two types of recep­tors, called type I and type II receptors, located on the surface of its target cells. Upon binding of TGFβ, type II receptors phosphorylate type I receptors. The type I receptors then phosphorylate a class of proteins known as Smads, which bind to an additional protein (a “co- Smad”) and move into the nucleus.

Once inside the nucleus, the Smad complex activates the expression of genes that inhibit cell proliferation. Two key genes produce the p15 protein and the p21 protein, which both function as Cdk inhibitors.

The p15 and p21 proteins halt progression through the cell cycle by inhibiting the Cdk-cyclin complexes whose actions are required for passing through key transition points in the cycle.

Components of the TGFβ-Smad signaling pathway are frequently inactivated in human cancers. For example, loss-of-function mutations in the TGFβ receptor are common in colon and stomach cancers, and occur in some cancers of the breast, ovary, and pancreas as well.

Loss-of-function mutations in Smad proteins are likewise observed in a variety of cancers, including 50% of all pancreatic cancers and about 30% of colon cancers. Such evidence indicates that the genes coding for TGFβ recep­tors and Smads both qualify as tumor suppressors.

vi. One Gene Produces Two Tumor Suppressor Proteins: p16 and ARF:

Thus far, this article has described the relationship between tumor suppressor genes and several signaling pathways for inhibiting cell proliferation or promoting cell death. The next tumor suppressor to be covered, known as the CDKN2A gene, exhibits the rather unusual property of coding for two different proteins that act independently on two of these pathways, the Rb pathway and the p53 pathway.

How does the CDKN2A gene produce two entirely different tumor suppressor proteins? Because the genetic code is read three bases at a time, changing the start point by one or two nucleotides will completely change the message contained in a base sequence.

For example, the sequence AAAGGGCCC can be read in three different reading frames starting from the first, second, or third base—that is, starting as AAA-GGG . . ., AAG- GGC …, or AGG-GCC …, respectively. A shift in the normal reading frame usually creates a garbled message that does not code for a functional protein. In the ease of the CDKN2A gene, however, a shift in the reading frame leads to the production of an alternative protein that is fully functional.

The first of the two proteins produced by the CDKN2A gene is the pl6 protein (also called INK4a), a Cdk inhibitor that suppresses the activity of the Cdk- cyclin complex that normally phosphorylates the Rb protein. Loss-of-function mutations affecting p16 lead to excessive Cdk-cyclin activity and inappropriate Rb phosphorylation. Since the phosphorylated form of Rb cannot restrain the cell cycle at the restriction point, the net result is a loss of cell cycle control.

The second protein produced by the CDKN2A gene is called the ARF (for Alternative Reading Frame) protein. Although they are produced by the same gene, p16 and ARF are completely different proteins exhibiting no sequence similarity. Whereas p16 is a Cdk inhibitor, ARF binds to and promotes the degradation of Mdm2, the ubiquitin ligase that normally targets p53 for destruction by tagging it with ubiquitin (see Figures 5 and 6).

By pro­moting the degradation of Mdm2, ARF facilitates the stabilization and accumulation of p53. Conversely, loss- of-function mutations affecting ARF interfere with the ability of p53 to accumulate and perform its function in triggering cell cycle arrest and cell death.

The CDKN2A gene therefore influences cell prolifera­tion and survival through two independent proteins: the p16 protein, which is required for proper Rb signaling, and the ARF protein, which is required for proper p53 signaling (Figure 11).

Loss-of-function mutations in CDKN2A have been observed in numerous human cancers, including 15% to 30% of all cancers originating in the breast, lung, pancreas, and bladder. Deletion of both copies of the CDKN2A gene, which leads to complete absence of both the p16 and ARF proteins, is common in such cases.

2. Essay on Tumor Suppressor Genes: (Around 2500 Words)

Roles in DNA Repair and Genetic Stability:

Although they are involved in a variety of different signaling pathways, the tumor suppressor genes discussed thus far share a fundamental feature in common. They produce proteins whose normal function is to inhibit cell proliferation and survival. Loss-of-function mutations in such genes therefore have the opposite effect, namely increased cell proliferation and survival.

A second group of tumor suppressors act through their effects on DNA repair and the maintenance of chromosome integrity. Unlike genes that exert direct effects on cell proliferation and whose inactivation can lead directly to tumor formation, the inactivation of genes involved in DNA main­tenance and repair acts indirectly by permitting an increased mutation rate for all genes. This increased mutation rate in turn increases the likelihood that alterations will arise in other genes that directly affect cell proliferation.

The terms gatekeepers and caretakers are used to distinguish between these two classes of tumor suppressor genes. The tumor suppressors described in the first part of this article, which exert direct effects on cell proliferation and survival, are considered to be “gatekeepers” because the loss of such genes directly opens the gates to tumor formation.

Tumor suppressors involved in DNA mainte­nance and repair, on the other hand, are “caretakers” that preserve the integrity of the genome and whose inactiva­tion leads to mutations in other genes (including gatekeepers) that actually trigger the development of cancer. We will examine the functions of some of these caretaker genes.

i. Genes Involved in Excision and Mismatch Repair Help Prevent the Accumulation of Localized DNA Errors:

Cancer cells accumulate mutations at rates that can be hundreds or even thousands of times higher than normal. This condition, called genetic instability, does not by itself disrupt the normal controls on cell proliferation.

In fact, most of the mutations that arise in genetically unstable cells are likely to be harmful mutations that hinder cell survival. But elevated mutation rates also increase the probability that occasional mutations will arise that allows cells to escape from the normal constraints on cell proliferation and survival.

Cells that randomly incur such mutations will tend to outgrow their neigh­bors, an important first step in the development of cancer. Increased mutation rates also facilitate tumor progression in which cells acquire additional traits—for example, faster growth rate, increased invasiveness, ability to survive in the bloodstream, resistance to immune attack, ability to grow in other organs, resistance to drugs, and evasion of death-triggering mechanisms—that allow cancers to become increasingly more aggressive.

Genetic instability occurs in several different forms that differ in their underlying mechanisms. The simplest type is caused by defects in the DNA repair mechanisms that cells use for correcting localized errors involving one or a few nucleotides. These localized errors typically arise either from exposure to DNA- damaging agents or from base-pairing mistakes that take place during DNA replication.

There are two types of repair mechanisms employed for correcting such errors. Excision repair, is capable of repairing abnormal bases created by exposure to DNA-damaging agents, and mismatch repair, is used for correcting inappropriately paired bases that arise spontaneously during DNA replication.

Individuals who inherit loss-of-function mutations involving genes required for either of these repair mecha­nisms exhibit an increased cancer risk. For example, inherited mutations in excision repair genes cause xeroderma pigmentosum, a hereditary cancer syndrome involving an extremely high risk for skin cancer.

In a similar fashion, inherited mutations in genes coding for proteins involved in mismatch repair are responsible for hereditary nonpolyposis colon cancer (HNPCC), a heredi­tary syndrome associated with a high risk for colon cancer.

Although both of these hereditary syndromes involve a striking increase in cancer risk, xeroderma pigmentosum exhibits a recessive pattern of inheritance and HNPCC exhibits a dominant pattern of inheritance. In other words, inheriting an elevated cancer risk requires two defective copies of an excision repair gene but only one defective copy of a mismatch repair gene.

The reason for this difference appears to be related to how many steps are required to create genetic instability in the two cases (Figure 12). In a person who inherits a single defective mismatch repair gene all that is required to start accumulating DNA errors at a high rate is for the second copy of the gene to undergo mutation. This second “hit” will immediately permit uncorrected errors to accumulate during normal DNA replication because of the absence of mismatch repair.

In contrast, if a person were to inherit a single defective excision repair gene, subsequent mutation of the second copy of the gene would debilitate excision repair but would not immediately lead to the accumulation of mutations.

A third step, namely exposure to a DNA-damaging agent such as ultraviolet light, is needed to actually create the mutations. Thus more steps are needed to create genetic instability involving excision repair than is the case for mismatch repair.

Inherited mutations in genes required for excision or mismatch repair create a dramatic increase in the risk for certain hereditary cancers, but mutations in these two classes of genes are less important for most nonhereditary forms of cancer.

Nonetheless, mutations in excision or mismatch repair have been detected in about 15% of colon cancers and in several other kinds of cancer as well, suggesting that deficiencies in DNA repair occasionally contribute to the genetic instabilities observed in non- hereditary cancers.

Proteins Produced by the BRCA1 and BRCA2 Genes Assist in the Repair of Double-Strand DNA Breaks:

Another type of genetic instability exhibited by cancer cells involves their tendency to acquire gross abnormali­ties in chromosome structure and number. Such chromosomal instabilities can be caused by defects in a variety of different tumor suppressors, including the BRCA1 and BRCA2 genes.

Women who inherit a muta­tion in one of the BRCA genes typically exhibit a lifetime cancer risk of 40% to 80% for breast cancer and 15% to 65% for ovarian cancer. The BRCA1 and BRCA2 genes were initially thought to exert their effects directly on cell proliferation, but later studies revealed that they produce proteins involved in pathways for sensing DNA damage and performing the necessary repairs.

The two BRCA tumor suppressor genes code for large nuclear proteins that bear little resemblance to one another. An early clue regarding their cellular role came from the observation that cells deficient in either of the BRCA proteins exhibit large numbers of chromosomal abnormalities, including broken chromosomes and chromosomal translocations.

The apparent reason for these abnormalities is that the two BRCA proteins are involved in the process by which cells repair double-strand breaks in DNA. Double-strand breaks are more difficult to repair than single-strand breaks because with single-strand breaks, the remaining strand of the DNA double helix remains intact and can serve as a template for aligning and repairing the defective strand.

In contrast, double-strand breaks completely cleave the DNA double helix into two separate fragments and the repair machinery is therefore con­fronted with the problem of identifying the correct two fragments and rejoining their broken ends without losing any nucleotides.

The two main ways of repairing double-strand breaks are nonhomologous end-joining and homologous recombination. Of the two mechanisms, homologous recombination is less prone to error because it uses the DNA present in the unbroken homologous chromosome to serve as a template for guiding the repair of the DNA from the broken chromosome.

Repairing double-strand breaks by homologous recombination is a complex process that requires the participation of a large number of different proteins, including BRCA1 and BRCA2. The pathway is activated by the same ATM kinase whose role in detecting and responding to DNA damage was introduced earlier in this article (see Figure 5).

We have already seen that in response to DNA damage, the ATM kinase catalyzes the phosphorylation of the p53 protein, which then halts the cell cycle to permit time for repair to occur. The ATM kinase also phosphorylates and activates more than a dozen additional proteins involved in cell cycle control and DNA repair, including BRCA1 and other molecules required for repairing double-strand breaks.

Figure 13 shows that the mechanism for repairing double-strand breaks by homologous recombination involves two main phases. First, a group of proteins called the Rad50 exonuclease complex removes nucleotides from one strand of the broken end of a DNA double helix to expose a single-stranded segment on the opposite strand.

In the second phase, a multi-protein assembly called the Rad51 repair complex carries out a “strand invasion” reaction in which the exposed single-stranded DNA segment at the end of the broken DNA molecule displaces one of the two strands of the intact DNA molecule being used as a template.

In this step, Rad51 first coats the single-stranded DNA the coated strand then invades and moves along the target DNA double helix until it reaches a complementary sequence. Once it has been located, the complementary sequence is used as a template for guiding repair of the broken DNA.

Although their roles are not completely understood, the BRCA1 and BRCA2 proteins are both required for efficient repair of double-strand breaks. BRCA2 binds tightly to and controls the activity of Rad51, the central protein responsible for carrying out strand invasion during repair by homologous recombination.

BRCA1 is associated with both the Rad50 exonuclease complex and the Rad51 repair complex. Moreover, it is known that ATM phosphorylates BRCA1 in response to DNA damage, suggesting that BRCA1 plays an early role in activating the pathway for repairing double-strand breaks.

Cells deficient in either BRCA1 or BRCA2 are extremely sensitive to carcinogenic agents that produce double-strand DNA breaks. In such cells, double-strand breaks can only be repaired by error-prone mechanisms, such as non-homologous end-joining, that lead to broken, rearranged, and translocated chromosomes. The resulting chromosomal instability is thought to play a large role in the cancer risks exhibited by women who inherit BRCA1 or BRCA2 mutations.

Mutations in Genes that Influence Mitotic Spindle Behavior can Lead to Chromosomal Instabilities:

We have just seen how broken and translocated chromo­somes arise in cancer cells as a result of mutations that disrupt tumor suppressor genes needed for repairing double-strand DNA breaks. Another chromosomal abnormality frequently observed in cancer cells is the tendency for whole chromosomes to be lost or gained, thereby leading to aneuploid cells that possess an abnormal number of chromosomes (Figure 14).

The various mechanisms that underlie the development of aneuploidy are just beginning to be unraveled, but evidence already points to the existence of tumor suppressor genes whose loss contributes to this type of chromosomal instability.

To explain how these tumor suppressors work, we first need to review the normal mechanisms used by cells for sorting and parceling out chromosomes during cell division. In a normal cell cycle, chromosomal DNA is first replicated during S phase to create duplicate copies of each chromosome, and the duplicate copies are then separated into the two new cells formed by the subse­quent mitotic cell division.

Accurate separation of the duplicated chromosomes is accomplished by attaching the chromosomes to the mitotic spindle, which separates and moves the chromosomes in a way that ensures that each new cell receives a complete set of chromosomes (Figure 15).

A critical moment occurs at the end of metaphase, when the chromosomes line up at the center of the mitotic spindle just before being parceled out to the two new cells. If chromosome movement toward opposite spindle poles were to begin before the chromosomes is all attached to the spindle, a newly forming cell might receive extra copies of some chromosomes and no copies of others.

To protect against this possible danger, cells possess a control mechanism called the spindle checkpoint that monitors chromosome attachment to the spindle and prevents chromosome movement from beginning until all chromo­somes are properly attached. In the absence of such a mechanism, there would be no guarantee that each newly forming cell would receive a complete set of chromosomes (see Figure 15, bottom right).

The key to the spindle checkpoint is the anaphase- promoting complex, a multiprotein complex that triggers the onset of anaphase—the stage of mitosis when the chromosomes move toward opposite poles of the mitotic spindle.

As shown in Figure 16a, the anaphase-promoting complex initiate’s chromosome movement by activating separase, an enzyme that breaks down proteins called cohesins that hold the duplicated chromosomes together. As long as they are joined together by cohesins, the dupli­cated chromosomes cannot separate from each other and move toward opposite spindle poles.

To prevent premature separation, chromosomes that are not yet attached to the mitotic spindle send a “wait” signal that inhibits the anaphase-promoting complex, thereby blocking the activation of separase. The “wait” signal is transmitted by proteins that are members of the Mad and Bub families.

The Mad and Bub proteins bind to chromosomes that are unattached to the mitotic spindle and are converted into a Mad-Bub multiprotein complex, which inhibits the anaphase-promoting complex by blocking the action of one of its essential activators, the Cdc20 protein (see Figure 16b).

After the chromosomes have all become attached to the spindle, the Mad and Bub proteins are no longer converted into this inhibitory complex and the anaphase-promoting complex is free to initiate the onset of anaphase.

Mutations that cause the loss or inactivation of Mad or Bub proteins have been linked to certain types of cancer, which indicates that genes coding for some of the Mad and Bub proteins behave as tumor suppressor genes. A lack of Mad or Bub proteins caused by loss-of-function mutations in these tumor suppressor genes disrupts the “wait” mechanism and impedes the ability of the spindle checkpoint to operate properly.

Under such conditions, chromosome movement toward the spindle poles begins before all the chromosomes are properly attached to the mitotic spindle. The result is a state of chromosomal instability in which cell division creates aneuploid cells lacking some chromosomes and possessing extra copies of others.

Another route to chromosomal instability involves the mechanism responsible for assembling the mitotic spindle. Formation of a mitotic spindle requires two small structures called centrosomes, one located at each end of the spindle (see Figure 15). Centrosomes promote the assembly of the spindle microtubules, which form in the space between the two centrosomes. Cancer cells often possess extra centrosomes and therefore produce aberrant mitotic spindles.

In Figure 17, we see a cancer cell with three centrosomes that have assembled a spindle with three poles. Multipolar spindles containing three or more poles, which are rare in normal tissues but common in cancer cells, contribute to the development of aneuploidy because they cannot sort the two sets of chromosomes accurately. Cells produced by mitosis involving an abnormal spindle will often be missing certain chromo­somes and thus will lack any tumor suppressor genes that the missing chromosomes would normally possess.

3. Essay on Tumor Suppressor Genes: (Around 2000 Words)

Role of Mutation and Non-Mutation in Converting Normal Cells into Cancer Cells:

Mutations in cancer-related genes, and the genetic insta­bility that facilitates the accumulation of such mutations, are centrally involved in the mechanisms by which cancers arise.

Yet one cannot explain the behavior of a malignant tumor by pointing solely to gene mutations. The final part of this article will provide a broad overview of the role played not just by mutations but also by non-mutational changes in converting normal cells into cancer cells.

i. Cancers Vary in their Gene Expression Profiles:

Mutations that create oncogenes or disrupt the function of tumor suppressor genes are central to the development of cancer, but they do not explain all the cellular changes that accompany the conversion of normal cells into cancer cells.

Many of the properties exhibited by cancer cells are triggered not by gene mutations, but by switching on (or off) the expression of normal genes, thereby leading to increases (or decreases) in the production of hundreds of different proteins. The term epigenetic change is employed when referring to such alterations that are based on changing the expression of a gene rather than mutating it.

Measuring epigenetic changes requires techniques that can monitor the expression of thousands of genes simultaneously. One very powerful tool is the DNA microarray, a fingernail-sized, thin chip of glass or plastic that has been spotted at fixed locations with thousands of DNA fragments corresponding to various genes of interest.

A single microarray may contain 10,000 or more spots, each representing a different gene. To determine which genes are being expressed in any given cell popula­tion, one begins by extracting molecules of messenger RNA (mRNA), which represent the products of gene transcription. The mRNA is then copied with reverse transcriptase, an enzyme that makes single-stranded DNA copies that are complementary in sequence to each mRNA.

The resulting single-stranded DNA (called cDNA for complementary DNA) is then attached to a fluorescent dye. When the microarray is bathed with the fluorescent cDNA, each cDNA molecule binds or hybridizes by complementary base-pairing to the spot containing the specific gene to which it corresponds.

Figure 18 illustrates how DNA microarrays can be used to create a gene expression profile that compares the patterns of gene expression in cancer cells and a corre­sponding population of normal cells. In this particular example, two fluorescent dyes are used: a red dye to label cDNAs derived from cancer cells and a green dye to label cDNAs derived from the corresponding normal cells.

When the red and green cDNAs are mixed together and placed on a DNA microarray, the red cDNAs bind to genes expressed in cancer cells and the green cDNAs bind to genes expressed in normal cells.

Red spots therefore repre­sent higher expression of a gene in cancer cells, green spots represent higher expression of a gene in normal cells, yellow spots (caused by a mixture of red and green fluorescence) represent genes whose expression is roughly the same, and black spots (absence of fluorescence) repre­sent genes expressed in neither cell type.

Thus the relative expression of thousands of genes in cancer and normal cells can be compared by measuring the intensity and color of the fluorescence of each spot. Such analyses have revealed that the expression of hundreds of different genes is typically altered in cancer cells compared with normal cells of the same tissue. Moreover, significant variations in gene expression are often detected when the same type of cancer is examined in different patients.

The changes in gene expression commonly exhibited by cancer cells arise in several difference ways. One well- documented mechanism involves epigenetic silencing by DNA methylation, a process in which methyl groups are attached to the base C in DNA at sites where it is located adjacent to the base G.

In vertebrate DNA, these -CG- sequences are preferentially located near the beginning of genes (about half of all human genes are associated with -CG- sites). When -CG- sequences undergo methylation, the transcription of adjacent genes is inhibited or “silenced.”

Most -CG- sites are un-methylated in normal cells, but extensive methylation is often seen in cancer cells, where it leads to the inappropriate silencing of a variety of different genes. Tumor suppressor genes are frequently among the genes to be silenced by this mechanism.

In fact, the tumor suppressor genes of cancer cells are inactivated by epigenetic silencing at least as often as they are inactivated by DNA mutation. Loss of gene function through inappropriate methylation may therefore be as important to cancer cells as mutation induced loss of function.

ii. Colon Cancer Illustrates How a Stepwise Series of Mutations can Lead to Malignancy:

Cancer arises via a multistep process in which cellular properties gradually change over time as mutations confer new traits that impart selective advantages to the cells in which they arise.

Now that we have described the main classes of cancer-related genes and the molecular path­ways in which they participate, it is appropriate to return to the concept of multistep carcinogenesis to see how a specific sequence of gene mutations can lead to cancer.

Current estimates indicate that there are more than 100 different oncogenes and several dozen tumor sup­pressor genes. For cancer to arise, it is rarely sufficient to have a defect in just one of these genes, nor is it necessary for a large number to be involved.

Instead, each type of cancer tends to be characterized by a small handful of mutations involving the inactivation of tumor suppressor genes as well as the conversion of proto-oncogenes into oncogenes. In other words, creating a cancer cell usually requires that the brakes on cell growth (tumor suppressor genes) be released and the accelerators for cell growth (oncogenes) be activated.

This principle is nicely illustrated by the stepwise progression toward malignancy observed in colon cancer. Scientists have isolated DNA from a large number of colon cancer patients and examined it for the presence of mutations. The most common pattern to be detected is the presence of a KRAS oncogene (a member of the RAS gene family) accompanied by loss-of-function mutations in the tumor suppressor genes APC, p53, and SMAD4.

Rapidly growing colon cancers tend to exhibit all four genetic alterations, whereas benign tumors have only one or two, suggesting that mutations in the four genes occur in a stepwise fashion that correlates with increasingly aggressive behavior.

As shown in Figure 19, the earliest mutation to be routinely detected is loss of function of the APC gene, which frequently occurs in small polyps before cancer has even arisen. Mutations in KRAS tend to be seen when the polyps get larger, and mutations in SMAD4 and p53 usually appear as cancer finally begins to develop.

These mutations, however, do not always occur in the same sequence or with the same exact set of genes. For example, APC mutations are found in about two-thirds of all colon cancers, which means that the APC gene is normal in one out of every three cases.

Analysis of tumors containing normal APC genes has revealed that many of them possess oncogenes that produce an abnormal, hyperactive form of β-catenin, a protein that—like the APC protein—is involved in Wnt signaling (see Figure 8).

Because APC inhibits the Wnt pathway and β-catenin stimulates it, mutations leading to the loss of APC and mutations that create hyperactive forms of β-catenin have the same basic effect- Both enhance cell proliferation by increasing the activity of the Wnt pathway.

Another pathway frequently disrupted in colon cancer is the TGFβ-Smad pathway, which inhibits rather than stim­ulates epithelial cell proliferation. Loss-of-function mutations in genes coding for components of this pathway, such as the TGFβ receptor or Smad4, are commonly detected in colon cancers. Such mutations disrupt the growth-inhibiting activity of the TGFβ-Smad pathway and thereby contribute to enhanced cell proliferation.

Overall, the general principle illustrated by the various colon cancer mutations is that different tumor suppressor genes and oncogenes can affect the same pathway, and it is the disruption of particular signaling pathways that is important in cancer cells rather than the particular gene mutations through which the disruption is achieved (Table 2).

iii. The Various Causes of Cancer can be brought Together into a Single Model:

Colon cancer illustrates how normal cells can be converted into cancer cells by a small number of genetic changes, each affecting a particular pathway and conferring some type of selective advantage. Of course, colon cancer is just one among dozens of different human cancers, and the few genes commonly mutated in colon cancer are only a tiny fraction of the more than 100 different oncogenes and tumor suppressor genes.

When various kinds of tumors are compared, it is found that different combinations of gene mutations can lead to cancer and that each type of cancer tends to exhibit its own characteristic mutation patterns.

Despite this variability, a number of shared principles are apparent in the various routes to cancer. An overview is provided by the model illustrated in Figure 20, which begins with the four main causes of cancer: chemicals, radiation, infectious agents, and heredity.

Each of these four factors contributes to the development of malig­nancy. While the details may differ, the bottom line is that one way or another, each of the four causes of cancer leads to DNA alterations.

In the case of either viruses that introduce specific oncogenes into cells or cancer syndromes that arise from inherited gene defects, the DNA alterations involve a specific gene. Most of the DNA mutations induced by carcinogens, on the other hand, are random. The higher the dose and potency of the carcinogen, the greater the DNA damage and therefore the greater the probability that a random mutation will disrupt a critical gene.

But critical genes (proto-oncogenes and tumor suppressor genes) represent only a tiny fraction of the chromosomal DNA, so the random nature of mutation means that luck plays a significant role if two people are exposed to the same dose of a carcinogen, one may develop cancer while the other does not simply because random mutations happen to damage a critical proto-oncogene or tumor suppressor gene in the unlucky individual.

The random nature of mutation contributes to the long period of time that is usually required for cancer to develop. Moreover, when DNA repair mechanisms and DNA damage checkpoints are operating properly, many mutations are either repaired or the cells containing them are destroyed by apoptosis. Taken together, such consider­ations may help explain why cancer is largely a disease of older age.

For cancer to develop, cells need to gradually accumulate a stepwise series of appropriate mutations involving the inactivation of tumor suppressor genes as well as the conversion of proto-oncogenes into oncogenes.

Gastric Cancer: Recent Molecular Classification Advances, Racial Disparity, and Management Implications

Gastric adenocarcinoma remains an aggressive and poorly understood malignancy with a heterogeneous presentation and tumor biology. The current histologic and anatomic classification has been ineffective in guiding therapy, with only marginal improvement in outcome over time. Furthermore, the variation in presentation and disease among racial and ethnic groups amplifies the complexity of this cancer. An understanding of the clinical and molecular variability is important for effective treatment. Recent advances in molecular biology have better defined gastric cancer subtypes. We systematically review recent literature on the molecular classification of gastric adenocarcinoma and the associated management implications, with an emphasis on Hispanic and Native American populations.

Gastric adenocarcinoma is the third leading cause of cancer deaths worldwide and is responsible for 723,000 deaths annually. 1 More than 90% of gastric cancers are adenocarcinoma, and current clinical classification is based on histology according to Laurén type (intestinal or diffuse) 2 and by HER2/neu amplification. Although this classification has a general association with treatment responsiveness and outcome, it does not account for the heterogeneous nature of the disease and cannot identify patients who may benefit from novel therapies.

Large-scale sequencing efforts have characterized the genomic landscape of gastric cancer and allowed for better categorization of the clinical heterogeneity among patients with gastric cancer. The Cancer Genome Atlas (TCGA) and Asian Cancer Research Group (ACRG) studies have defined distinct molecular gastric cancer subtypes on the basis of mutation profile and expression analysis. Although these studies have provided a new schema for gastric cancer classification, the clinical application of these criteria is only starting to be delineated. Currently, the use of molecular markers is limited to screening for genetic risk and possible response to therapy in the metastatic setting. However, the clinical utility of these molecular groups may prove to be useful in drug development and individualized treatment of patients with gastric cancer.

This review details the various classification strategies that have been applied to gastric cancer and their current clinical applications, including histologic subtypes, molecular markers, and genomic classes. In addition, limitations of each are discussed, with a special emphasis on generalizability to minority groups that have not been included in landmark studies.

The historical classification of gastric cancer was based on microscopic features and expression of selected markers. Microscopic assessment remains the dominant means of differentiating gastric tumors given its ease of use, low cost, and immediate availability.

Laurén classification is the most commonly applied and accepted histologic classification of gastric cancer that pathologists and physicians use today. This classification originally was proposed in 1965 after Laurén examined 1,344 patients with gastric cancer in Finland. 2 Two categories comprise the classification: intestinal and diffuse. The designation is subjectively assigned according to the dominant histologic appearance of the tumor. Intestinal and diffuse gastric cancer exhibit numerous differences in pathology, epidemiology, and etiology. Intestinal-type tumors form gland-like structures, develop in patients who have severe atrophic gastritis, and are strongly associated with intestinal metaplasia. 3 Severe atrophic gastritis and intestinal metaplasia caused by persistent Helicobacter pylori infection is strongly associated with the development of intestinal-type gastric cancer. 4 On the other hand, diffuse-type gastric cancer is associated with cellular discohesion, poor differentiation, treatment resistance, and inferior outcome. Although this histologic classification has been widely used over the past five decades, its utility is limited because it fails to encompass the significant genetic complexity that exists. Several molecular markers, including HER2/neu, CHD1, and the mismatch repair (MMR) genes, have demonstrated specific clinical utility in guiding treatment.

HER2 protein, a member of the epidermal growth factor receptors family, is overexpressed/amplified in numerous types of human cancer, including breast, gastric, colon, bladder, lung, and uterine. In gastric cancer, the incidence of HER2 overexpression has been reported to be approximately 9% to 38%. HER2 overexpression is highest in gastroesophageal junction or stomach cardia tumors compared with tumors that arise more distally in the stomach. HER2 positivity is seen predominantly in intestinal-type tumors and has a low prevalence in diffuse-type gastric cancer (32% v 6%). 5 HER2 mutation as a therapeutic target in gastric cancer was recognized after the publication of the Trastuzumab for Gastric Cancer trial, 6 which led to Food and Drug Administration approval of trastuzumab as the first targeted therapy option in gastric cancer.

Several mutations have been associated with an early onset of diffuse-type gastric cancer. Inherited mutations in the E-cadherin gene (CDH1) were initially described in three Maori kindreds from New Zealand with familial gastric cancer 7 and have subsequently been described in other parts of the world. Patients with CDH1 mutations have a histology described as diffuse gastric cancers with signet rings and a particularly poor prognosis. Underexpression of E-cadherin is associated with epithelial-mesenchymal transition (EMT), a prognostic marker of poor clinical outcome. Hereditary diffuse gastric cancer is genetically transmitted as an autosomal dominant trait with high penetrance. In young patients with diffuse gastric cancer, testing for the E-cadherin (CDH1) gene mutation or the α-E-catenin (CTNNA1) gene mutation is important because of the role of each in hereditary diffuse gastric cancer. 7,8 Discovery of the CDH1 mutation should prompt genetic counseling and testing for family members. Prophylactic total gastrectomy should be considered for all CDH1 mutation carriers because of the high risk of invasive diffuse-type gastric cancer and lack of reliable surveillance. 9

In addition to CDH1 mutations, RHOA gene mutations have been described in Asian patients with gastric cancer that result in a similar EMT phenotype. 10,11 Beyond CDH1- and RHOA-related gastric cancer, Lynch syndrome–related gastric cancer has important clinical considerations.

Lynch syndrome is characterized by a significantly increased risk of colorectal endometrial and other malignancies, including gastric cancer. 12 It is an autosomal dominant mutation in one of several DNA MMR genes. Approximately 15% of gastric cancers seem to have microsatellite instability (MSI) associated with a mutation of the MMR genes. Gastric cancer screening recommendations for Lynch syndrome include esophagogastroduodenoscopy with random gastric biopsy starting at age 30 years, with continued surveillance every 2 to 3 years, for patients with high-risk features (defined as the presence of gastric atrophy, intestinal metaplasia, positive family history, and Asian ethnicity). 13 Because patients with Lynch syndrome have high MSI and, hence, an increased level of tumor neoantigens, checkpoint inhibitors offer a new approach to target these cancers. One such checkpoint inhibitor, pembrolizumab, has been approved for patients with advanced solid tumors and MSI. In addition to MMR status, PD-L1 expression status is an important factor for treatment selection in patients with metastatic disease.

With these and other genetic advances, the need for a new molecular classification that incorporates the emerging molecular landscape of gastric cancer has become necessary. This emerging classification proposed by TCGA and ACRG identifies dysregulated pathways and candidate gene mutations in each subtype and may facilitate drug development in specific subsets of gastric cancer.

Epstein-Barr virus (EBV 9% of patients): Characterized by EBV positivity, these tumors had higher prevalence of DNA promoter hypermethylation (including CDKN2A promoter hypermethylation in all tumors), frequent PIK3CA (80%), ARID1A (55%), and BCOR (23%) mutations. EBV-positive tumors had a higher prevalence of DNA hypermethylation than any cancers reported by TCGA. Amplification of the 9p24.1 locus, which contains genes that encode JAK2, PD-L1, and PD-L2, was seen in 15% of the tumors. EBV gastric cancers were mostly located in the gastric fundus or body (62%). Prolonged survival has been associated with EBV-positive gastric cancers. 15

MSI (22% of patients): MSI was associated with hypermutated genome, DNA hypermethylation, and MLH1 silencing. It was observed more often in older patients (median age, 72 years), and mutations in PIK3CA (42%) are common.

Genomically stable (GS 20% of patients): GS cancers had a low mutation burden and low somatic copy number aberrations. Clinically, this group demonstrated more-aggressive disease, with 73% of tumors possessing diffuse histology, and were enriched in somatic CDH1 mutations (37%), RHOA mutations (30%), and CLDN18-ARHGAP rearrangements (30%). RHOA mutations and CLDN18-ARHGAP rearrangements were mutually exclusive, and both (by affecting RHOA proteins) result in the disparate growth patterns and lack of cellular cohesion that are hallmarks of diffuse tumors.

Chromosomal instability (CIN 50% of patients): CIN tumors had high somatic copy number aberrations and were associated with intestinal histology. TP53 mutations were common (73% of CIN tumors) as were amplifications of the Ras receptor tyrosine kinase pathway. The most commonly affected genes were VEGFA, EGFR (10%), ERBB2 (24%), ERBB3 (8%), FGFR2 (8%), and c-Met (8%). In addition, amplifications of cell cycle mediators (CCNE1, CCND1, and CDK6) frequently were seen. This group might benefit the most from vascular endothelial growth factor inhibitors and anti–human epidermal growth factor receptor 2 agents, which already are in use for gastric cancers (ramucirumab and trastuzumab).

Although the TCGA group was able to delineate four molecularly distinct subtypes of gastric cancer, no associated survival difference was noted within its cohort. Subsequently, retrospective studies have reviewed the predictive and prognostic value of the TCGA classification. 16 In this small study, patients with the CIN subtype had the greatest associated survival benefit with adjuvant chemotherapy (hazard ratio [HR], 0.39 95% CI, 0.16 to 0.94 P = .03), whereas the GS subtype (which is over-represented by diffuse gastric cancer) had the least associated benefit with adjuvant chemotherapy (HR, 0.83 95% CI, 0.36 to 1.89 P = .65). Given the limitations of the TCGA related to predicting outcome, the ACRG developed a similar classification but with attention focused on clinical prognostication.

MSI-high (23% of patients): This subtype had the best prognosis, and more than one half of patients were diagnosed with early-stage (I/II) cancer. This subset was similar to the TCGA MSI subset.

Microsatellite stable (MSS)/EMT (15% of patients): Tumors in this subgroup occurred more often in the gastric antrum (37%) and body (46%), had diffuse-type histology (80%), and presented at advanced stages (III/IV [80%]). The patients in this group were 10 years younger than the other groups (median age, 53 years). This subtype had the worst overall prognosis and recurrence-free survival. Although 80% of MSS/EMT tumors were diffuse, only 27% of all diffuse gastric tumors were captured in this subgroup. This classification was useful as it highlighted diffuse tumors with poor prognosis.

MSS/TP53 intact (26% of patients): EBV infection occurred predominantly in this subgroup. However, EBV tumors only accounted for a small proportion of this group (15%). This subtype had the second-best prognosis.

MSS/TP53 loss (36% of patients): This subgroup had the highest rate of TP53 mutations (60%), and TCGA CIN tumors were enriched in this subgroup. This group demonstrated a less-favorable prognosis compared with MSI and MSS/epithelial/TP53 intact but had a better prognosis than MSS/EMT.

The four ACRG subtypes were highly associated with survival, a finding validated in three subsequent independent cohort studies, including the TCGA. 18 Survival analysis conducted after the merging of the four data sets demonstrated a significant survival association with the distinct ACRG subsets ( Fig 1 ). Of note, the TCGA genomic classifiers when used on the ACRG cohort did not show survival differences.

Fig 1. (A) Overall survival on the basis of Asian Cancer Research Group (ACRG) molecular subtypes in the ACRG gastric cancer cohort (P < .001). (B) Overall survival on the basis of ACRG molecular subtypes in an independent SMC-2 cohort (n = 277). Cox trend test showed an overall P < .001. (C) Overall survival on the basis of ACRG molecular subtypes in GSE15459, an independent cohort from Singapore (n = 200). Cox trend test showed an overall P < .001. (D) Overall survival using on the basis of ACRG molecular subtypes in The Cancer Genome Atlas gastric cancer cohort (n = 205). Cox trend test overall P < .001. (E) Merged SMC-2, Singapore, and The Cancer Genome Atlas cohorts. Cox trend and log-rank tests showed an overall P < .001. (F) Merged data for all four cohorts. Cox trend and log-rank tests showed an overall P < .001. Reprinted with permission. 18 EMT, epithelial-mesenchymal transition MSI, microsatellite instability MSS, microsatellite stable TP53 – , TP53 loss TP53 + , TP53 intact.

ACRG cohorts also demonstrated distinct recurrence patterns. The MSS/EMT group had significantly more peritoneal metastases than all other subtypes (64% v 23%). A higher percentage of liver-limited metastases was seen in the MSI (23%) and MSS/TP53 loss (21%) subtypes versus the MSS/EMT (5%) and MSS/TP53 intact (8%) subtypes.

Data related to gastric cancer in minorities are limited by few studies and low clinical trial enrollment in these populations. The majority of patients in the TCGA study focused on Asian and white patients. Only a small number of black patients and no Hispanic patients were included. In the majority of modern trials that guide current gastric cancer treatment, Hispanic patient inclusion was minimal (Intergroup-0116), not reported (MAGIC [Medical Research Council Adjuvant Gastric Infusional Chemotherapy], POET [Preoperative Chemotherapy Versus Chemoradiotherapy in Locally Advanced Adenocarcinomas of the Esophagogastric Junction]), or absent (ARTIST [Adjuvant Chemoradiotherapy in Stomach Tumors], CLASSIC [Capecitabine and Oxaliplatin Adjuvant Study in Stomach Cancer]). 18a The considerable disease variation among racial groups has led to a substantial knowledge gap in the treatment of non-Asian and nonwhite patients with gastric cancer.

In multiple studies, the incidence of gastric adenocarcinoma, especially noncardia cancers, has been observed to be higher in Hispanic Americans compared with non-Hispanic whites. 19-25 Hispanic Americans also have been observed to be younger and to have more advanced disease at the time of diagnosis. They also tend to have a higher incidence of distal, diffuse, and poorly differentiated tumors (Table 1). Survival data are variable with regard to Hispanic patients. In several small retrospective studies, Hispanics had worse survival and higher recurrence rates when controlled for stage. 20 Hispanic ethnicity has been associated with less-frequent liver metastases than has white race, but Hispanics have an increased risk of peritoneal disease. 23

Table 1. Hispanic Versus White Incidence of Distal, Diffuse, and Poorly Differentiated Gastric Tumors

This disparity also has been described in other ethnic groups. Compared with non-Hispanic white patients, Alaska native patients have a higher incidence of gastric carcinoma (22.4 per 100,000 v 6.8 per 100,000), 25,28 present younger, have a higher presence of signet ring cell carcinomas and distal/central tumor location (female relative risk, 14.85 [95% CI, 5.95 to 44.94] male relative risk, 5.79 [95% CI, 2.58 to 13.53]). 29 The 5-year survival was 10% versus 22% for non-Hispanic whites in one study. 30 Hispanic and Alaska Native patients not only have a higher incidence of gastric cancer but also seem to have an inferior per stage outcome compared with non-Hispanic white patients.

Given the higher rates of distal gastric cancers in minority patients, young age at presentation, and early peritoneal metastases, one would be tempted to hypothesize that racial disparity may be secondary to the higher prevalence of more-aggressive subtypes of gastric cancer (MSS/EMT subtypes). However, the burden of aggressive subtypes of gastric cancer in under-represented populations has not been studied and currently is an active area of research.

Although why Hispanic patients present at a younger age and with more-advanced disease than other ethnic groups is unclear, the variable presentation has direct implications on the recommendations for diagnosis and treatment. Higher T and N stage disease along with diffuse-type histology have been associated with early peritoneal spread in patients with gastric cancer. Computer tomography scanning is the cornerstone for staging gastric cancer but has significant limitations in detecting low-volume peritoneal disease. Likewise, positron emission tomography scanning plays a limited role in initial staging of nongastroesophageal junction tumors because of the low sensitivity for detecting the primary tumor, lymph node metastases, and peritoneal disease. These limitations are related to limited spatial resolution, inability to detect small lesions, and the lack of contrast enhancement. This is especially true of diffuse-type cancers that have lower baseline SUV compared to intestinal-type cancers, which limits its diagnostic accuracy. 31,32

Given that axial imaging has a sensitivity as low as 25% for peritoneal lesions, 33 laparoscopy has become an important adjunct to stage patients accurately and prevent unnecessary invasive operations. 34,35 Currently, staging laparoscopy is recommended for a clinical stage IB or higher gastric cancer. 36 Reported positive laparoscopy rates are highly variable (13% to 63%) secondary to nonuniform study inclusion criteria. 27,33,34,37-41 The yield of laparoscopy is related to disease stage. In studies limited to patients with locally advanced tumors, metastases were identified frequently (29% to 63%). 37,39 As expected, when patients with earlier stage tumors undergo routine laparoscopy, the yield decreases (17% to 41%). 27,34,35,41

A recent diverse cohort study of the utility of laparoscopy in a high-volume gastric cancer center found that Hispanic patients versus white patients had a significantly higher staging laparoscopy yield (44% v 21% P = .04). 27 As previously observed, the study also demonstrated that Hispanic patients presented more often at a younger age with more-advanced disease and with more-aggressive tumor histology (poor differentiation and diffuse disease) than white patients. 22,26,27 Multivariable analysis revealed that clinical T3/T4 disease (HR, 5.4 95% CI, 2.10 to 13.6), signet ring histology (HR, 45.4 95% CI, 2.1 to 13.6), and poor tumor differentiation (HR, 4.4 95% CI, 1.6 to 12.7) were associated with the identification of peritoneal disease not appreciated on routine axial imaging. 27 Although race and ethnicity were not independently associated with the detection of unappreciated peritoneal disease, the study supports that underlying tumor biology remains the greatest predictor of early peritoneal spread.

Laparoscopy and peritoneal cytology are essential in managing diffuse-type gastric cancer to ensure appropriate staging of the peritoneum. In most patients, laparoscopy is useful for detecting unappreciated peritoneal disease but also is important for ensuring the absence of disease when considering surgical therapy for locally advanced disease and linitis plastica. Linitis is extensive submucosal spread of diffuse-type gastric cancer that affects the majority of the stomach and has been associated with Hispanic patients given the early onset of diffuse cancer. 42 Given the association with advanced tumor stage (T3/T4, 96%), extensive regional disease (N2/N3, 71%), and frequent positive margin at resection (33%), outcome for all patients with linitis is similar to those with metastatic disease. 43 However, studies have demonstrated that in patients who have negative peritoneal staging, received neoadjuvant therapy, obtained negative intraoperative margins, and complete D2 lymph node dissection, outcomes are similar to patients with stage III disease without linitis. 43-45

Disparate outcomes in gastric cancer among various ethnicities may be attributed to variable responses to adjuvant chemotherapy. Variable response may be secondary to differential representation of molecular subtypes, each with variable chemotherapy sensitivities, or intrinsic genetic polymorphism that affects metabolism and efficacy of chemotherapy. Genetic polymorphism also may affect the toxicity profile of commonly used chemotherapy agents, which in turn affects tolerability and efficacy. Because Hispanic patients are under-represented in large clinical trials for chemotherapy regimens, whether they derive similar benefit as observed in Asian and white populations is unclear.

The biology of gastric cancer can be expected to play a significant role in the efficacy of a chemotherapy regimen. Although data that pertain to the chemotherapy responsiveness of each recently characterized molecular subtype of gastric cancer are limited, insight into the driving molecular biology can be obtained from data available within the older Laurén classification. Exploratory subgroup analysis from prominent randomized clinical trials have pointed toward minimal benefit for adjuvant therapies in diffuse gastric cancer. The updated analysis of the seminal INT0116 study observed reduced benefit with diffuse-type histology. 46 Lack of chemoradiation therapy benefit in diffuse gastric cancer also was noticed in a subgroup analysis of the ARTIST trial, in which 60% of patients enrolled had diffuse gastric cancer. 47 Similarly, a lack of benefit for fluoropyrimidine-based adjuvant chemotherapy in patients with diffuse-type gastric cancer was suggested by a recent study that demonstrated a > 50% rate of early relapse within 12 months of receipt of chemotherapy. 48 These studies have suggested primary treatment resistance for diffuse-type cancers and stressed the need for novel therapeutics.

An improved understanding of the biology of diffuse-type cancers is essential for better treatment of the disease. Only a subset of diffuse gastric tumors seems to show poor prognosis, and the MSS/EMT subtype best represents this group. This subset is particularly challenging to study in the metastatic setting because of its predominant peritoneal spread without liver or other organ involvement. As a result, patients are unlikely to be enrolled in clinical trials because diagnosis often occurs when significant obstructive symptoms are present. In addition, Response Evaluation Criteria in Solid Tumors (RECIST) measurements are not possible because no available radiologic study can accurately quantify disease burden in low-volume peritoneal disease.

Recognition of molecular heterogeneity is important because the one-size-fits-all approach toward gastric cancer is unlikely to result in improved outcomes. Currently, beyond HER2, PD-L1, CDH1 status, and MSI status, molecular classification or demographic factors have not been evaluated prospectively to determine utility in the work-up or treatment of gastric cancer. Future clinical trials should account for this heterogeneity to direct treatment on the basis of individual subtypes. Simpler algorithms that use in situ hybridization and immunohistochemistry to classify gastric cancers have been suggested. 49,50 Although inexpensive and relatively easy to implement into clinical practice, immunohistochemistry-based testing is unlikely to capture the complete molecular complexity of gastric cancer. Therefore, a validated, commercially available genomic analysis is necessary to classify gastric cancers appropriately.

Skin cancer, the most common human malignancy, is primarily diagnosed visually, beginning with an initial clinical screening and followed potentially by dermoscopic analysis, a biopsy and histopathological examination. Automated classification of skin lesions using images is a challenging task owing to the fine-grained variability in the appearance of skin lesions.

Deep convolutional neural networks (CNNs) show potential for general and highly variable tasks across many fine-grained object categories. Here we demonstrate classification of skin lesions using a single CNN, trained end-to-end from images directly, using only pixels and disease labels as inputs. We train a CNN using a dataset of 129,450 clinical images—two orders of magnitude larger than previous datasets — consisting of 2,032 different diseases. We test its performance against 21 board-certified dermatologists on biopsy-proven clinical images with two critical binary classification use cases: malignant carcinomas versus benign seborrheic keratoses and malignant melanomas versus benign nevi. The first case represents the identification of the most common cancers, the second represents the identification of the deadliest skin cancer.

The CNN achieves performance on par with all tested experts across both tasks, demonstrating an artificial intelligence capable of classifying skin cancer with a level of competence comparable to dermatologists. Outfitted with deep neural networks, mobile devices can potentially extend the reach of dermatologists outside of the clinic. It is projected that 6.3 billion smartphone subscriptions will exist by the year 2021 and can therefore potentially provide low-cost universal access to vital diagnostic care.

Our classification technique is a deep CNN. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using a deep neural network trained on our dataset. Inception v3 CNN architecture reprinted from

Skin cancer classification performance of the CNN and dermatologists. a, The deep learning CNN outperforms the average of the dermatologists at skin cancer classification (keratinocyte carcinomas and melanomas) using photographic and dermoscopic images. For each test, previously unseen, biopsy-proven images of lesions are displayed, and dermatologists are asked if they would: biopsy/treat the lesion or reassure the patient. A dermatologist outputs a single prediction per image and is thus represented by a single red point. The green points are the average of the dermatologists for each task, with error bars denoting one standard deviation (calculated from n = 25, 22 and 21 tested dermatologists for carcinoma, melanoma and melanoma under dermoscopy, respectively). The CNN is represented by the blue curve, and the AUC is the CNN’s measure of performance, with a maximum value of 1. The CNN achieves superior performance to a dermatologist if the sensitivity–specificity point of the dermatologist lies below the blue curve, which most do. b, The deep learning CNN exhibits reliable cancer classification when tested on a larger dataset. We tested the CNN on more images to demonstrate robust and reliable cancer classification. The CNN’s curves are smoother owing to the larger test set.

Philosophical Explanations of Cancer, Biology, Science and Biodiversity

'I do think advocates of the “naturalistic” approach to disease sometimes downplay the role of values in these difficult cases of line-drawing in medicine, with respect to diagnosis, prognosis, and choice of markers of risk. If we wish to respect patient autonomy, however, we should make these risk-benefit trade-offs transparent to them.'

'Sober argued that there are certain claims of science that are both “causal” and “a priori” - this sounds counterintuitive, because we tend to think of causal claims as empirical ones.'

'The short answer is that cancer is a very complex disease we should not expect a science that investigates this complex disease to come up with a simple, unified theory or model that explains all there is to explain. Cancer is massively heterogeneous - both in its causes and dynamics, as well as in responses to therapy, progression, etc. This is illuminated by the fact that when I tell cancer scientists that I wrote a book on cancer, they typically ask me which kind of cancer (e.g., breast, bone, lung, etc.). No cancer scientist thinks that one should (or could) write a single book on cancer (in general).'

'The issue is not “who” should get screened, but “when,” and “how” or “how often" one should get screened. For instance, routine mammography screening of women starting at age 40 is likely to lead to a lot of false positives, unnecessary follow up, expense, and overdiagnosis and overtreatment.'

'Little boys are at higher risk of cancer than little girls so there is likely to be some greater vulnerability associated with sex. That said, sex is not just about chromosomes, and gender is not just sex assigned at birth. '

Anya Plutynski has written on the history and philosophy of evolutionary biology and genetics, the role of modeling in science, and scientific explanation. Here she discusses science and natural kinds and cancer, ‘line drawing’, ‘inductive risk’ and ‘underdeterminism’, normativity and naturalism, genetics, context and causality, causal information vs accuracy, values and objectivity, Sober and causal modelling, Rosenberg and Lange, Kuhn and Lakatos, pluralism and pragmatics, whether it's sensible to ask why someone gets cancer, cancer screening issues, gender and sex, and biodiversity.

3:16: What made you become a philosopher?

Anya Plutynski: On the one hand, I suppose it was a series of accidents. On the other hand, I was always interested in philosophy, though I did not recognize my interests as philosophical, initially. In high school, I was drawn to authors like Hesse, Dostoevsky, Tolstoy, and Huxley. In retrospect, what drew me to these authors were how they engaged with philosophical questions about freedom, morality, and the relationship between science and society. I developed a love for history and philosophy of science at University of Chicago, taking classes with J.Z. Smith, Dan Garber, Howard Stein, and Bob Richards. I went to Penn for graduate school initially intending to work on Kant, though I developed doubts about whether I had the motivation to continue, and I took some classes in biology, thinking I might leave philosophy and go to medical school.

However, I started finding my classes in biology philosophically interesting, especially an independent study with Neil Shubin. He and I worked through several important texts in the modern synthesis, just as I was taking a history of biology seminar with Mark Adams in the HSS program, and hearing these issues discussed from the biologist’s perspective, alongside that of a historian like Mark, brought into focus for me how the questions these scientists were debating were not only empirical, but often methodological, and conceptual. I shifted from Kant to history and philosophy of biology. Gary Hatfield supervised my dissertation and supported my pursuing a master’s degree in biology alongside my Ph.D. in philosophy. Gary was a fantastic advisor he was able to help me synthesize my interests in history, biology, and philosophy of science, and guided me toward communities like ISHPSSB (the International Society for the History, Philosophy and Social Studies of Biology). At my first ISH conference, I felt like I found my academic home.

3:16: So you’re interested in issues of the philosophy of science . Interestingly you have expertise in cancer and use this knowledge as a source of many of your ideas regarding these philosophical ideas. So one of the general issues discussed by philosophers of science regards the nature and existence of natural kinds. So you ask this question regarding cancer – you wonder whether cancer counts as one natural kind, or many? You look at two responses: Khalidi thinks it is a natural kind, Lange thinks it is a Kludge! So to start, can you sketch what they argue and where the big disagreement lies and what it tells us about what we mean when we say something is natural and scientific?

AP: Khalidi argues that cancer seems to qualify as a homeostatic property cluster kind, because the “hallmarks of cancer” (or, hallmark features of cancer cells) suggest that there are common “homeostatic mechanisms” that cause cancer cells to cluster, as a kind. Lange argues that diseases are not natural kinds, and so, of course, cancer is a not a natural kind, and not even a unified type of disease. As we are learning more about cancer, it has become clearer that each cancer is the product of a suite of distinctive functional disruptions. I agree with Khalidi that in principle the “hallmarks” are all strongly associated with the behavior of cancer cells, but I also agree with Lange that cancers are not one kind of dysfunctional state, but a motley collection. I think both make good points, and in the book I use the two stances as foils to propose my own view. The philosophical picture of natural kinds - even the modest forms like Boyd’s homeostatic property cluster view - are not well-suited to the aims of medicine and disease classification. It turns out that we can cross-classify different cancers, for different purposes.

By way of a simple example, we can classify all “end stage” cancers as (in a sense) of a kind, in that they all are likely to lead to death in the near future, but each such cancer has a distinctive etiology, and might have arisen in different tissues, organs, etc.. We can also cross classify cancers that arise in different tissues and organs as of a ‘kind’ insofar as they similarly respond to a specific targeted drug. Disease classification in medicine, in other words, is pragmatic, and concerned largely with diagnostic, prognostic, and treatment matters. These are “natural” categories in a sense, because there are empirical (predictive, explanatory) relationships that they track, but there’s not one kind of outcome we’re interested in, in medicine, and different causal pathways are predictive and explanatory of these different outcomes (disease initiation, progression, metastasis, death, response to drugs, etc.). Prioritizing one as the “true” way to carve up disease categories is a choice we make, and not a choice that’s determined by the natural world.

3:16: What are the issues of ‘line drawing’, ‘inductive risk’ and ‘underdeterminism’ involved in trying to understand a disease as a disease?

AP: Early diagnosis saves lives, but not all cancers progress uniformly to metastasis and death. Some remain indolent (or, regress). Cancer screening thus carries a risk of not only false positives, but also, overdiagnosis and overtreatment (the diagnosis and treatment of a condition that would never have led to symptoms or mortality in the lifetime of the patient). Diagnosing cancer thus involves a judgment that risks error and so carries “inductive risk”. (There’s a good deal of uncertainty about how many patients are overdiagnosed and overtreated for cancer - estimates range from less than 1% to as high as 20% for some cancers (prostate, thyroid).)

Diagnosis of cancer also involves "drawing a line" between invasive disease and indolent or slow growing conditions that may or may not lead to invasive cancer. Assessing how and where to draw these lines involves choices, which have various risk-benefit trade-offs. Inductive risk also is at play in assessing the benefits and harms of different screening regimens – whether in choice of modality or choice of cut-off for various biomarkers of disease.

3:16: Are normative judgments inevitably involved in making scientific distinctions and do you think the focus on being naturalistic unhelpful because it encourages value judgments to be less than transparent and distorts the picture of what science is?

AP: In medicine, many distinctions do require a fine balance of risk and benefit, which require value judgments about risk tolerance. This is especially apparent when diagnostic categories are vague or open ended, or the chance of progression of illness given some pathophysiological state is uncertain. The obvious cases are psychiatric diagnoses whether to call someone’s psychological state a disorder or simply ordinary suffering has been a long standing matter of controversy in some cases. I don’t think being “naturalistic” is unhelpful, if you mean simply attending to empirical evidence! Attention to the total evidence is always ideal in scientific judgment, and in medicine! As long as you are also transparent about the role of values in such judgment, when it comes to patient’s decision making about treatment, then “naturalism” per se is not a problem. However, I do think advocates of the “naturalistic” approach to disease sometimes downplay the role of values in these difficult cases of line-drawing in medicine, with respect to diagnosis, prognosis, and choice of markers of risk. If we wish to respect patient autonomy, however, we should make these risk-benefit trade-offs transparent to them.

3:16: How does your thinking about the role of genetic factors in causing cancer illustrate the role of ‘context dependency’, locality and instability in assigning causal roles to entities in science and help us understand the ‘causal selection’ problem?

AP: This won’t be news to most philosophers of biology (or, for that matter, most cancer scientists!), but one of the central upshots of what I found when looking at the role of genes in cancer is that the effect of a mutation is highly context-dependent. As you might expect (from an evolutionary perspective), there are lots of “back up” mechanisms in place to prevent changes to genes during somatic cell division from yielding disease. So, for instance, we shed skin all the time that carries many “cancer mutations,” but these mutations never yield disease. Whether a mutation acquired during cell division yields disease depends on where and when it comes about, the cell and tissue or organ of origin, factors in the tissue microenvironment, like presence of a blood supply, immune response, age and sex of the patient, and a host of other factors. So, I think that for complex diseases like cancer - the causal selection problem is more often than not a pragmatic matter of sorting out where and how we are likely to effectively intervene. In many cases of complex, multifactorial disease, there may at best be pragmatic reasons to focus on one or another specific cause, causal pathway, or mode of intervention.

3:16: In setting standards for what we should be using as evidence for a scientific theory, should we care more about what the causal information is for rather than with accuracy – and is this what actually happens?

AP: When we talk about biomedicine, what count as “theories” is a broad swath of things: mere hypotheses, versus robust families of models that can be used to make precise predictions, or yield “how likely” (or “how possibly”) explanations. In other words, these “theories” are built for different purposes, and so they can have be said to have different virtues, insofar as the meet or fail to meet those purposes. Consider classical population genetics: I tend to think of this “theory” as a family of models that are useful (if simplified) ways of representing the causal dynamics of evolutionary change in populations. Likewise for much of the mathematical modeling in cancer: many of these are simplified, idealized models that help us investigate very general questions about cancer’s dynamics. Though in some cases they can be used to make accurate predictions, often they provide at best how “possibly” explanations.

So, questions of “accuracy” are not so central to these theoretical parts of biology and medicine. When it comes to hypotheses like whether this or that drug works (and how well) to reduce mortality in this or that cancer, then of course, predictive accuracy is important, but so too is causal information about how the drug works or is likely to work. So, I tend to think that the answer to this question depends on the kind of “scientific theory” at issue. Modelers often have to make choices that trade off these virtues - causal information v. predictive accuracy - in different ways in different contexts.

3:16: Why don’t you think values that keep coming in to these judgments compromise objectivity?

AP: Helen Longino pointed out that there are different kinds of values in science - what she calls “social” and “epistemic” values. Whether or to what extent such values compromise objectivity depends on how and when they play a role in a scientific inquiry. For instance, trading off generality for accuracy in theoretical modeling is not (necessarily) to compromise “objectivity” – at least objective judgments about likely general patterns or processes, e.g., governing cancer’s dynamics. But, the desire for profits in developing and expanding the application of cancer drugs can certainly compromise objectivity, and lead to poor quality research. Values play a role in establishing methodological standards or setting thresholds for efficacy of drugs. Such values can compromise the quality of research and the likely benefit to cancer patients.

3:16: You’ve looked in the field of cancer studies to illustrate examples of Sober’s ideas regarding causal modeling. First, what are the views regarding ‘causal modelling’ that Sober defends?

AP: Sober has written so many articles and books on causal modeling that I feel unprepared to summarize them! But, I expect you’re thinking of his 2011 paper on “a priori causal truths”, which I discuss in the book?

3:16: That's the one yes.

AP: Ok, well then, in 2011, Sober argued that there are certain claims of science that are both “causal” and “a priori” - this sounds counterintuitive, because we tend to think of causal claims as empirical ones. But, here’s a vivid example from Sober’s paper, an example from theoretical population genetics (the part of evolutionary theory that gives mathematical representations of evolutionary dynamics, of the sort I mentioned above: i.e., “if-then” claims about the causal factors at work in evolving populations): "If A is fitter than B in a population in which no other evolutionary causes are at work, and the traits are perfectly heritable, then A will, in expectation, increase in frequency.” Sober claims that this truth is causal, because it’s about the role of natural selection in a population. However, it is also “a priori” in the following sense: it's not defeasible by empirical observation. Nonetheless, it takes work to demonstrate – he’s not claiming that we know such a thing from birth, or that it’s “obvious” or somehow “true by definition,” but that it’s “necessarily” true, as an “if-then” claim, about any population that meets these (idealized) conditions.

3:16: So why do you think that arguments against this sort of modeling from Rosenberg and Lange don’t work?

AP: To some extent, I think that what’s happening in this dispute is Rosenberg and Lange and Sober are talking past one another. Sober argues that there are true, general claims about causal relationships in ideal conditions, and he gives examples, such as the one above. In the book, I consider several similar claims from cancer researchers, such as this one: “If stem cell renewal were the only driver of cancer incidence, then there should be a linear relationship between stem cell renewal and rate of incidence across different tissues.” Theoretical claims such as this abound in ecology, economics, and evolutionary biology there’s even a popular jokes about this kind of approach to science: the "imagine a spherical cow” meme. Scientists propose and offer theoretical demonstrations "causal truths” about spherical cows and other imagined states of affairs, because they’re interested in such “if-then” generalizations: generalizations about what would follow, if certain extreme or idealized conditions held. Building fictional models can enable scientists to derive informative truths about both ideal systems, and the real world.

Such truths may be informative not only “despite” their lack of fit to the world, but indeed, exactly because of their lack of fit, as folks like Sober, Wimsatt, and more recently, Sober’s student, Angela Potochnik, have argued. Rosenberg and Lange (2011) argue against Sober that it is absurd to suggest that we can meaningfully speak of such claims as both “a priori” and “causal”. My argument was that denying this would make much of scientific reasoning – modeling and mathematical arguments, yielding scientific understanding, prediction, and explanation – opaque. I suggest that several examples of theoretical explanations in cancer look much like the cases that Sober describes: e.g. “for any system that meets these conditions, it would follow that…” This derivation of general “causal a priori” truths is part of what modelers in science do.

3:16: Kuhn and Lakatos were central to debates in philosophy of science at the end of last century and I see that you are still drawing on them so presumably they still have currency in contemporary philosophy of science debates? Can you sketch for us how their approaches are currently understood, perhaps through looking at the role of ‘puzzles’ within cancer research, as opposed to 'theories', and how this distinction helps frame Lakatosean ‘research programs’?

AP: I first read Kuhn and Lakatos in classes with Stein at Chicago, and I’ve always found them fruitful to return to. Both engaged more directly with scientific practice than many of their contemporaries, in ways still relevant today. Both recognized that science is not simply a matter of theory development or hypothesis testing, but a dynamic interplay between theory, experiment: iterated puzzle solving. Both saw that theoretical commitments are one of several factors driving science practical limitations and interests shape the questions we ask, and the answers we give. I used Kuhn to frame my last chapter, because he mentions almost in passing that there is no one “solution” to the puzzle of cancer.

I liked this way of thinking of cancer because it seemed more in keeping with how scientists themselves think. Scientists that study cancer do not by and large see cancer as one, unified problem, but as a set of very different puzzles to be solved. I argue in the book that many cancer scientists don’t see their work as advancing and testing “theories,” so much as solving puzzles. I was led to this way of thinking about cancer research also by Joan Fujimura, as well as M. Morange. Both of their work on the history of 20th Century cancer research suggested to me that what launched the focus on cancer genes were specific puzzles that scientists happened to have the right tools to solve.

3:16: Your approach defends a pluarlist and pragmatic approach to scientific research in biomedical research. Could you summarise the key points, what are its advantages and limitations and then say whether you think this sort of approach is relevant only to this area of scientific research or whether in fact this notion of having partial and overlapping models is something that applies in other fields of science as well?

AP: The short answer is that cancer is a very complex disease we should not expect a science that investigates this complex disease to come up with a simple, unified theory or model that explains all there is to explain. Cancer is massively heterogeneous - both in its causes and dynamics, as well as in responses to therapy, progression, etc. This is illuminated by the fact that when I tell cancer scientists that I wrote a book on cancer, they typically ask me which kind of cancer (e.g., breast, bone, lung, etc.). No cancer scientist thinks that one should (or could) write a single book on cancer (in general). While there are simple theoretical models that help us get a partial picture of cancer, they often represent only a small part of the picture – representing one specific dynamic, causal pathway, or one temporal and spatial scale. So, having a variety of different models and modes of investigation of diseases like cancer - from the molecular on up to the epidemiological - is incredibly important, if we wish to explain the many different patterns, processes, and outcomes involved.

3:16: Given your approach, is it really sensible to ask why a person gets cancer?

AP: You and I both have had cancer so it’s a case of a philosophical question that has a directly personal interest. If you mean, are there some factors that increase the risk of cancer (and, you grant that identifying such risk factors is a satisfactory answer to the “why” question), then yes. In some cases, it is sensible to ask why a person gets cancer. Indeed, I think we can and should assign causal responsibility, whenever someone knowingly exposes people to high doses of carcinogens (e.g., radiation, polluted waterways or air). Licking paint brushes with paint containing radium was “the” cause of mouth and jaw cancers, in the case of the “Radium girls.” Inherited mutations to genes, such as BRCA 1 and 2, increase one’s lifelong risk of developing cancers of the breast and ovaries (and, some other cancers). So, it is possible in some cases to identify a strongly predisposing cause, known to be associated with specific cancers. However, in the vast majority of cases, it’s very difficult to identify a major causal factor most cancers are due to many indirect causal factors that accumulate over a lifetime. As for the “existential” why question that many of us cancer survivors face, it’s hard to give a satisfying answer. There is a sense in which cancer is a matter of “chance."

3:16: What are the implications for deciding who should get cancer screening from your thinking here? As you ask: In what ways does inductive risk, broadly conceived, come into play in the science behind cancer screening, and mammography screening in particular?

AP: The issue is not “who” should get screened, but “when,” and “how” or “how often" one should get screened. For instance, routine mammography screening of women starting at age 40 is likely to lead to a lot of false positives, unnecessary follow up, expense, and overdiagnosis and overtreatment. This is why the USPSTF argued in 2009 (and again in 2016) that for the vast majority of women, screening starting at 40 was unnecessary. The evidence they reviewed from mammography trials suggested that the largest benefit was to women starting screening at 50 (provided they did not have any family history or known risk factors). Likewise, PSA (prostate specific antigen) tests offered to a lot of men during the 1990s- early 2000s probably led to a lot of overdiagnosis and overtreatment for prostate cancer. Nowadays, the USPSTF recommends starting screening later, and watching to see if PSA numbers rise, rather than routinely treating patients at a certain PSA number cut-off. Of course, this is a decision one should make in consultation with a physician, in light of one’s own risk preferences.

3:16: There’s currently much debate around gender and sex roles: I note that you have written about how fundamental aspects of sex determination can impact the biology of brain tumors and what will need to be done to accommodate this discovery. I wonder whether you think these sorts of consideration need to be considered when we consider how to deflate the importance of gender difference which for some means erasing sex difference as well?

AP: Sex (in terms of not only the sex assigned at birth, but having predominantly XX or XY sex chromosomes, for those born with binary chromosomal compliments) can (and does) influence the relative risk of some cancers. XY folks are more likely, on average, to develop some cancers, not only the obvious ones (prostate cancer), but also bone, brain, and many other cancers, and not only as a matter of higher levels of exposure to risk factors, since the risk is elevated even in children. Little boys are at higher risk of cancer than little girls so there is likely to be some greater vulnerability associated with sex. That said, sex is not just about chromosomes, and gender is not just sex assigned at birth.

I don’t think gender differences should be deflated or erased altogether. Gender identity can be incredibly important to defining how one sees themselves and their relationships to others. If you mean by “deflate the importance of gender difference,” eliminating gendered differences in salary, leadership roles, or social roles, then, I don’t think the biology has much to do with this. Equitable access to education, employment, and participation in government or leadership roles in society is a matter of justice. Leadership roles, income, or education, for instance, should not be allocated on the basis of either sex or gender. That there is some association of sex with cancer risk does not (at least not obviously) have any direct implications for access to leadership and education among the diversity of genders. There may be one exception: perhaps we should encourage men to retire earlier than women, since they die on average younger than women.

3:16: And of course biodiversity is another area of great significance currently. We’re apparently going through an extinction phase and again causality is a big issue for us as we try and decide what to do. Do your views regarding causality help us understand better this predicament – and perhaps others like climate change?

AP: These are great questions, but I’ve not really thought about how my views on cancer causation shape my thinking about extinction or climate change. In the context of biodiversity conservation, one insight I gained from reading a lot about the history and current practice of efforts at conservation is that attention to local context is incredibly important. Conservation planning cannot occur successfully when done in isolation from the people and places which one is seeking to conserve. I suppose that this echoes my thinking in cancer, about how, in cases where a multiplicity of causal factors are at play, operating at a variety of temporal and spatial scales, we need to attend to this diversity of causal pathways.

3:16: And finally for the readers here at 3:16, are there five books you could recommend other than your own that will take us further into your philosophical world?

AP: Great question!

If they’re interested in questions that come up in my book, I think I’d recommend Stegenga’s Medical Nihilism

Results and discussion

We analyzed and developed prognostic/diagnostic signatures for three different class distinctions: short term/long term survival and high stage/low stage for ovarian cancer, and metastatic/non metastatic tumor progression in breast cancer. We extracted gene expression profiles from The Cancer Genome Atlas (TCGA) for ovarian cancer phenotypes. These included 44 samples from 22 short term survivors and 22 long term survivors (SUR), and 59 TCGA samples differing in stage, with 10 early stage and 49 late stage samples. The ovarian survival and stage study subjects each individually yielded two separate sets of cancer tissue samples, whose biomarkers were extracted independently by groups at the Broad Institute (BI) and at the University of North Carolina (UNC). In addition, we separately analyzed an additional data set of gene expression biomarkers from 119 samples taken in a Duke University study [2] of advanced ovarian cancers. The phenotypes of the latter dataset were separated based on differential resistance to platinum therapy, with the data set separated into 34 incomplete response and 85 complete response samples. Finally, in order to test for biomarker stability across different types of data sets, we identified biomarkers that distinguished metastatic potential in breast cancer data from Wang et al. [3] and van de Vijver et al. [6] (details in the material and methods section).

We note that the UNC and BI ovarian cancer data sets in TCGA are based on tissue samples from the same subjects, analyzed by different laboratories, while the Wang and van de Vijver breast cancer data sets are analyzed by different laboratories and came from different patients.

Evaluation of biomarker sets based on KEGG and MSigDB pathways

We first tested biomarkers based on the 200 KEGG [30] human pathways obtained from MSigDB version 2.5 [16] and compared them to an alternative set of aggregate biomarkers consisting of 522 functional gene sets (selected on the basis of pathway membership and other biological criteria) from MSigDB version 2.5 (data set C2 [16]). We remark that both the KEGG pathways and C2 functional gene sets in fact represented very limited total numbers of unique genes. Specifically, only 4128 and 5602 genes are covered by KEGG pathways and the C2 functional gene sets respectively. To accommodate the loss of gene information compared to the traditional method of starting with expression data for 12042 (BI) and 17814 (UNC) genes, we added 5 manually curated gene sets extracted from the literature, all associated with ovarian cancer [1, 2, 27–29], to the relevant KEGG pathway set (the expanded set is called KEGG_ovary) and to the C2 functional gene set (C2_ovary). Figure 2 gives an assessment of relative accuracies of classification using different functional gene sets using a support vector machine (SVM).

Evaluation of the performances of different gene sets. These implementations include 200 KEGG pathways (KEGG), 522 functional pathways (C2), 200 KEGG pathways with 5 curated ovarian cancer associated gene sets (KEGG_Ovary), and 522 functional pathways with 5 curated ovarian cancer-associated gene sets (C2_Ovary). All test data sets are extracted from primary ovarian cancer tissue. UNC_sur and BI_sur denote ovarian survival data sets analyzed from UNC and BI respectively, and Platinum denotes Platinum response data sets.

We remark that these accuracies are based on balanced data sets and pathway biomarkers which are created from leading edge genes from the most discriminative pathways. It is difficult from the data in Figure 2 to determine that one of the above four groups of functional gene sets is better than the others at discriminating classes.

Evaluation of pathway-based classification methods based on ovarian cancer phenotypes

To evaluate pathway-based classification methods, we tested predictive performances for the different methods using pathway-based markers, as described in the Methods section. (a) The first is the GSEA-based Leading Edge Gene feature method (denoted here as GLEG). (b) The second is denoted as the GSEA Pathway Feature (GPF) Method. (c) The third is the SVM-based pathway feature (SPF) method.

In addition, in order to form a baseline measure, we created random gene sets as surrogate pathways by keeping KEGG pathway designations but doing a full permutation all genes (i.e. replacing each gene in a pathway by a randomly selected gene). We then performed the GPF pathway-based algorithm based on this randomly permuted gene set. The classifier based on this procedure was designated as the random pathway feature (RPF) method. We also calculated an additional baseline by selecting identical numbers of genes as the GLEG method from (i) the set of all KEGG genes, designated as the SKG (single KEGG gene) method and (ii) all genes (designated as SG).

All methods were implemented using standard SVM leave one out cross-validation in balanced data sets. Thus in the case of ovarian cancer stage classification, we randomly undersampled by choosing 10 samples out of 49 stage IV samples to balance 10 early stage samples, and repeated this procedure 10 times.

To compare with the baseline random pathway aggregation method, the accuracy of distinction between early stage and stage IV ovarian cancer using the UNC data set was 78% and 71%, respectively, using the GLEG and GPF methods. In comparison, it was 60% using the same number of randomly selected pathway features (RPF) as the number of pathway features in GPF. For the parallel BI stage data, the corresponding figures are 74% (GLEG), 81% (GPF) and 56.67% (RPF) respectively.

It can be seen from the results for the (random) RPF method (accuracies at 60% or less, uniformly lower than the GPF method), that the prior biological information from the pathways was a significant component of the method. In general, we have seen in other contexts that random clustering of features (which is the effect of this method of aggregating gene features into random pathways) can sometimes (surprisingly) improve performance over unclustered individual (single gene) features. In this case the random clusters (random pathways) did not outperform the individual features (genes), though even these RPF-based randomly clustered data conveyed some information.

Figure 3 shows the remaining performances of the three pathway-based classification methods and the gene-based classification method for stage and survival in two different BI and UNC (TCGA) data sets.

Comparison of the performances of the RPF, GLEG, GPF, SPF, SKG, and SG methods. These methods are tested in ovarian cancer data sets to discriminate survival time (SUR) and stage (stage). The notation represents use of the following features: RPF, random pathway features GLEG, leading edge genes using GSEA GPF, pathway features selected using GSEA SPF, pathway features selected using SVM SKG, single KEGG genes and SG, single genes.

Comparative discriminatory accuracy of core gene and pathway markers within breast cancer data sets

To further assess the discriminatory accuracy of pathway biomarkers, we analyzed two large metastasis breast cancer data sets [3, 6], both obtained from primary breast cancer, but from non-overlapping populations. Specifically, 93 patients in the Wang data set and 79 patients in the van de Vijver set were diagnosed with metastases within 5 years of initial diagnosis (metastasis group). The remaining groups of 183 and 216 patients, respectively, were designated as non-metastatic by the authors. For the Wang data set, we implemented 10 randomly selected subsampled data sets balanced (at n = 79 each) between metastatic and non-metastatic samples. These random sub-samplings of the Wang data set were performed in order to balance it between metastatic and non-metastatic cases. For each run on the balanced sets, leave-one-out cross-validation was performed. To compare discriminatory accuracy of pathway biomarkers against individual gene biomarkers, for each training data set (i.e., a new training set with each sample that was left out) we selected the top 20 upregulated and top 20 downregulated pathways separating the metastatic and non-metastatic groups, using GSEA. We then used the union of the leading edge genes from these 40 pathways as individual features. The classifier built on a training set was then used on the left-out test sample. The results are summarized in Figure 4. The best performing method was the SKG method, which uses the same number of individual gene features (selected only from KEGG) as the number of leading edge genes in the GLEG method. The obtained accuracies using this method are 66.94% (Wang) and 65.74% (van de Vijver). In contrast, using same numbers of genes not restricted to KEGG genes (SG) gives separate accuracies of 64.41% (Wang) and 65.44% (van de Vijver). The GLEG method gives values of 62.17% (Wang) and 63.34% (van de Vijver). The accuracies of the GSEA pathway feature method (GPF) for the van de Vijver and Wang data set are 61.26%, and 64.71% respectively.

Comparison of metastasis prediction performances based on the Wang and van de Vijver data sets. Each data set tested 10 combinations of data subsamplings (for the purpose of balancing the data) with leave one out cross validation in each of the 10. The vertical axis shows the average accuracy. Here the RPF, GLEG, GPF, SKG, and SG methods were used.

To understand these results better, we will also briefly mention a two-fold cross validation we performed using the same methods on these two data sets. In this test we additionally performed a pathway-level feature aggregation using an averaging of pathway-level features, as opposed to the GPF method of combining gene features in pathways using their SVM weights. More specifically, we note that the pathway features obtained using the GPF method involve not only the leading edge gene expressions for a given pathway p, but that these expressions are weighted by the gene weights w pi of these leading edge genes, inherited from the same training set when trained on the set of all genes. Depending on the noisiness of the data set, these weights may be unreliable individually. In particular, we noted that the weights in the van de Vijver data set were more unreliable than those in the Wang set. In fact, in this two-fold cross-validation test, accuracy in the van de Vijver data set was increased to 61.52% when we used mean pathway features generated without weights (still using leading edge genes only).

Briefly, we describe here an analysis of how such noise might have affected these additional two-fold results however further research needs to be done on this topic. Assume that we separate gene expression levels into signal and noise components, i.e.,

where x ij is the gene expression of gene i in sample j (signal), and z ij is the corresponding noise.

When we average gene expressions x

ij over a coherent subset of genes (e.g. when all genes are in the same pathway) the averaged noise z ij is quenched, which can help reduce the signal to noise ratio. In the case where we use weights to obtain such pathway features, e.g., as in the weighted sum ∑ i ∈ leading edge genes w i x

ij , the weights will have additional error attached to them if they are obtained in a noisy training set. In this case the effect of replacing the above pathway feature with the averaged pathway feature ∑ i ∈ leading edge genes x

ij (with appropriate final normalization) allows avoidance of the noise inherent in overly noisy weights w i, as well as denoising by equal averaging of the test data noise terms z ij. However, if the noise in the weights w i is qualitatively small enough, the performance of the weighted feature method can then improve on that of the above mean feature method. This phenomenon was observed when the same pair of methods (weighted and unweighted pathway averaging) was used on the Wang data set, and the weighted method performed better in this case than the unweighted one. This observation correlated with the fact that the weights in the van de Vijver data set had a significantly larger standard deviation than those in the Wang data set when different sub-samplings of the data were taken, leading to the conclusion that the van de Vijver data set was noisier. In this regard, combining these two methods (weighted and unweighted gene combinations) could be the most effective way to get better performance in this type of machine learning.

In the full leave one out experiment above, we also observed that all-gene features restricted to be obtained from genes in KEGG pathways (4128 of them) were more informative than features drawn from the full set of genes (over 12,000 in both data sets) available in the expression profiles.

Accuracy of pathway biomarkers across data sets: Metastasis and ovarian survival data sets

To validate the stability of pathway-based biomarkers as well as their classification accuracy, we studied the expression profiles of the two cohorts of breast cancer patients [3, 6]. In this study we used pathway features selected in one data set to predict metastasis in the other, thus effectively using one set as a training set and the other as a test set.

To determine pathway-based biomarkers, we determined in one data set the distinguishing KEGG pathways between the two phenotypes (metastatic and non-metastatic) using GSEA, and used these as biomarkers for classification of the other - we will call this reciprocal classification.

We tested each of the above pathway-based classification methods in this procedure, including the leading edge gene (GLEG) method and the pathway based biomarker (GPF) method. We compared these to standard (Fisher selection) SVM methods using matched numbers of genes as used in the GLEG method (SG method, see above). Out of 810 leading edge genes determined from the Wang data set, 636 of these were available in the van de Vijver data set. Correspondingly, there were 375 out of 391 unique genes chosen from the van de Vijver data set for reciprocal inclusion in the Wang data. The reason for the large difference in sizes is that numbers of leading edge genes were significantly different between the two sets. Thus the respective numbers of features using the GLEG method in the two data sets were 636 and 375.

Since both data sets are strongly unbalanced between two (metastatic and non-metastatic) phenotypes, we balanced the two classes for classification purposes (in training and test sets) by bootstrapping from the larger collection of non-metastatic samples using 5-fold undersampling, with each sample matched in size to that of the metastatic group. The performance figures (see Figure 5) form an average of 5 individual performances for each method on each data set. Figure 5 compares these reciprocal feature selection accuracies for pathway-based markers versus single gene markers.

Cross-validation between two different cohorts. The accuracies of cross-validation between the Wang and van de Vijver data sets and between the UNC and BI survival time data sets. The arrow denotes that genes were selected from training data sets of one cohort and tested the genes in the other cohort. For example, Wang -- > Vijver means that genes were selected from the Wang data sets and tested in the van de Vijver data set.

For the reciprocal test, the leading edge gene (GLEG) method trained on the Wang data set achieved 68.97% accuracy in classifying metastasis in van de Vijver et al. [6], while the reciprocal accuracy (training using van de Vijver and testing on Wang data) yielded 65.83% accuracy (as before these accuracies are reported on data sets balanced between the two phenotypes). However, in this reciprocal testing the GPF method (using pathway features) performed better than the single gene (SG) method in testing on the van de Vijver data set and worse than the SG method on the Wang data set (Figure 5).

In the case of the ovarian survival time data, since the BI and UNC samples were obtained from the same patients, we divided each of these data sets into two groups (BI_group1, BI_group2, UNC_group1 and UNC_group2). The group 1 patients in the BI and UNC datasets were the same, and similarly for the group 2 patients. To maintain full independence (in both subjects and assay facilities) of training and test sets, we performed gene feature selection using BI_group1 to test UNC_group 2, and vice-versa.

Figure 6 shows the average performance of each method. Overall, the best-performing method was the GLEG method, using leading edge genes based on GSEA pathway selection. The GLEG and GPF methods achieved average accuracies of 63.24% and 61.83%, respectively, among the four above-mentioned test data sets. Meanwhile, using single gene classifiers achieved average 57.26% accuracy. This is evidence that pathway-based biomarkers are more reliable for classifying cancer subtypes than single gene markers, in addition to their being more stable.

Average of accuracies with respect to different methods. The average of accuracies using the various methods for cross-validation between different data sets, such as the Wang and van de Vijver sets and the UNC and BI sets. The vertical axis represents averaged accuracies over all heights in the previous graph (Figure 5).

Reproducibility of pathway-based biomarkers and single-gene markers between data sets

In order to test the robustness (i.e., stability across data sets) of the pathway-based biomarkers, we consider the ovarian cancer survival and stage data sets, and the metastatic breast cancer data sets mentioned earlier. We recall that the full UNC and BI data sets (used for the stage and survival analysis) are based on different ovarian cancer tissue samples from the same subjects (analyzed by different laboratories) and that the Wang and van de Vijver metastatic breast cancer data sets involved independent sets of patients and were analyzed by different laboratories. The primary purpose of our analysis here is to compare the stability of pathway/gene biomarkers between the following approaches: (1) use of Fisher-selected individual gene biomarkers as basic features (SG classifier) (2) use of pathway-selected leading edge gene markers (GLEG classifier) (3) use of enriched pathway biomarkers as obtained from GSEA (GPF classifier). Reproducibility was computed by dividing numbers of significant biomarkers (a) intersecting between two different experiments and (b) appearing in the union of those in the same experiments.

Specifically, if B1 and B2 represent the respective sets of significant biomarkers in the two experiments, the computed ratio is = B 1 ∩ B 2 / B 1 ∪ B 2 , where |A| denotes the size of a set A.

In order to provide a valid comparison of the methods, we note that a comparison between an intersection and a union of two sets B1 and B2 as a measure of their generic mutual enrichment depends on the background (with total cardinality |B|) and the proportion of the background included in each of the sets, i.e.,|B1|/|B| and |B2|/|B|. In order to keep the above proportions of the background constant, we maintained all of them at 40/200, i.e., .2, which was the proportion of KEGG pathways we selected for the GPF method. Thus in comparing the stability of the GPF method with that of the SG (all single gene) and SKG (single KEGG gene) methods, we also selected from the above classes of genes the top 20% of all genes based on Fisher score, and formed a ratio parallel to the above ratio S for these single gene methods.

The reproducibilities S of pathway markers (based on the GPF method, i.e., proportions of top pathways in common among different data sets) are 0.40, 0.33, and 0.18 in the stage, survival, and metastasis data sets, respectively. The comparable figures for leading edge gene (GLEG) markers are 0.27, 0.25, and 0.15 (see discussion below). In contrast, the reproducibility of all single gene (SG) markers using Fisher selection are 0.22 in stage, 0.21 in survival and 0.07 in metastasis.

These data are graphed in Figure 7. As shown there, the pathway/gene markers corresponding to a pathway-based analysis are more consistent than individual Fisher-selected gene markers selected directly from expression profiles.

Agreements of different types of significant markers. The agreements of three different types of significance markers between two data sets: ‘GPF classifier’ denotes pathway features obtained by GSEA, and ‘GLEG classifier’ denotes leading edge gene markers determined by GSEA. ‘SG classifier’ and ‘SKG classifier’ denote the genes determined by Fisher selection from full gene expression profiles and restricted to the set of KEGG pathway genes, respectively, with feature numbers controlled to 20% of each population. ‘SG_random’ denotes gene sets selected randomly (again to 20% of the full gene set) from full gene expression profiles. ‘Ovarian_Stage’ denotes the ovarian stage data sets (marker stability compared between BI and UNC data), ‘Ovarian_SUR’ denotes ovarian survival data sets (BI vs. UNC), and ‘Metastasis’ denotes metastatic breast cancer data sets (based on the Wang and van de Vijver data sets). For each pair of datasets, overlapping biomarkers were all extracted from matching based on the top 40 pathways in each. Vertical axis represents biomarker consistency as the quotient formed by the size of the intersection of the two biomarker sets, divided by the size of their union.

We mention here that the above stability figures for the GLEG method (.27, .25, and .15) are generally underestimates of performance, since the numbers of leading edge genes which were generated by the top 20/20 (up- and down-regulated) pathways in fact amounted to be on the average 17% of all KEGG genes, yielding a smaller percentage than 20% of the background for the two sets B1 and B2 mentioned above. Therefore, as discussed above, since a larger percentage of the background can only improve the consistency ratio S defined above, this is in fact a slight underestimate of the performance of the GLEG method.

The above results indicate the overlap based on top pathway markers consisting of 20 up-regulated and 20 down-regulated pathways in each data set. Since we considered the BI/UNC pathway overlap to be less noisy, we also attempted a more parsimonious test of overlap between the two datasets using only 10/10 (upregulated/downregulated) pathways in each dataset. A significant pathway overlap signal was obtained also in this case. The result for different pathway numbers among these data sets is given in Figure 8. In particular, the highest signal S in pathway markers is .38 at a level of 10/10 (up and down-regulated) for survival data.

Consistency of different classes of biomarkers with respect to numbers of candidate pathways. The consistency (overlap level) of different types of top gene/pathway markers based on varying numbers of selected candidate pathways (40, 30, or 20) between UNC and BI ovarian survival data sets. For example, the 30 pathway_sur column heights represent overlap percentages of biomarkers in the survival data sets from BI and UNC (using pathway, leading edge gene, and single gene biomarkers, respectively, all extracted from matching based on the top 30 pathways in the BI and UNC datasets). Vertical axis is defined as in Figure 7.

The proportion of common pathways between the UNC and BI data sets increased from counts with 20/20 pathway selections to 10/10 selections, and the same holds for the selected leading edge gene ratios. In contrast, the ratio for Fisher-selected genes decreased at the same time. This implies that top pathways and pathway-based genes (leading edge genes) contained more core gene sets/pathways as stable biomarkers. The identical pattern of overlap for pathway markers, leading edge gene markers, and Fisher selected markers was observed for stage data.

Lower pathway overlap numbers for the 10/10 case in the Wang/van de Vijver data sets gave a clearly less significant signal (an overlap of only 3 pathways), presumably because of the higher variability involving both subjects and measurement protocols.

We remark that the performances indicated in Figure 8 may have the following interpretation. The decline in performance of the SKG (single KEGG gene) method as the number of pathways decreases from 40 to 20 may result from the following fact. First, for relatively small sample sizes (there were 22 short and 22 long survival cases), the Fisher method of differential expression measurement is not robust, so that as the number of potentially matching genes in the two sets decreases, the SKG curve of Figure 8 indicates a corresponding decrease in overlap of these genes.

We note that the overlap in enriched pathways vs. overlap in individual genes between the UNC and BI tissue samples is a measure of stability against variance in different measurements from the same individuals, as opposed to bias introduced by comparison of samples from completely different individuals. In contrast, the pathway stability vs. gene stability studied in the analysis of breast cancer metastasis (among the Wang and van de Vijver data sets) is a measure of stability against both the bias of an entirely different population as well as the variance of different sets of measurements.

Informative genes based on leading edge and Fisher-selection markers

In determining biologically significant biomarkers for differentiating two phenotypes (e.g. metastatic vs. non-metastatic cancer), it is generally more powerful to find significant biomarkers overlapping in biomarker selections from several different methods. Here we have selected significant markers based on the pathway method (based on leading edge genes, Table 1), discussed above, in addition to then refining these by also using the standard Fisher single gene selection method (Tables 2 and 3 below).

We first present the 10 most significant common leading edge genes from the two metastasis data sets, and then those from the two ovarian survival data sets, in Table 1. These genes were obtained by intersecting the leading edge genes in the top 20/20 pathways (20 upregulated and 20 downregulated) between the Wang and van de Vijver data sets. This resulted in 161 genes in common. The 10 genes with the highest averaged p-values (between the two datasets) were then selected (Table 1).

In addition to this method for identifying stable discriminative genes, we combined the pathway-based marker selection method with the standard Fisher p-value method (Tables 2 and 3) as follows. Among the Wang and van de Vijver metastasis data, we first identified a total of 118 genes and 70 genes, respectively, representing the intersection of the leading edge genes and a matched number of top Fisher selected genes, obtained separately in the two studies. These genes represented significantly up- or down-regulated genes between metastatic and non-metastatic patients. Between these two sets (118 genes from the Wang and 70 from the van de Vijver data), a total of 13 genes overlapped (see Additional file 1: Table S1). This is significant in that the original overlap between the two sets of top genes differentiating metastasis (also using the Wang and van de Vijver data sets), numbering 76 and 70, respectively, consisted of only three genes [5]. We note that 9 out of 10 genes in Table 1 are also found among the 13 genes in Additional file 1: Table S1.

The above 13 genes from the metastasis data were CCNB2, PSMA7, CCNE2, PTTG1, TPI1, RRM2, MAD2L1, BUB1B, SQLE, E2F1, NP, PSMB5, and TSTA3. Among these genes, 8 (consisting of PSMA7, TPI1, SQLE, E2F1, PTTG1, TSTA3, BUB1B, MAD2L1) have been confirmed in the literature [31, 33–36, 42] to be involved with several different cancers. Additional file 1: Table S1 presents the full name, designation and annotation of each gene. In addition, among the above 8 genes, SQLE, E2F1, PTTG1, TSTA3, BUB1B and MAD2L1, have been related with breast cancer [31, 33–36]. In particular, SQLE has been confirmed to be a predictor in early stage breast cancer of freedom from distant metastasis. Recently the pituitary tumor transforming gene PTTG1 was reported as an oncogene associated with breast cancer [33] as well as regulation of the immune system [43]. Vuaroqueaux et al. [35] reported E2F1, a well-known key transcription factor in proliferation and apoptosis, as a surrogate marker of breast cancer outcome. TSTA3 has been found to be one of the conserved genes for several breast cancer subtypes such as luminal, ERBB2+, and basal [36]. Yuan et al. [31] showed that MAD2L1 and BUBlB, known as spindle damage checkpoint genes, were overexpressed in breast cancer tissues.

Among the UNC and BI survival data, 11 genes, consisting of POLR1D, ID4, EDAR, BMPR2, HLA-DOA, DPYSL3, ANXA4, CXCL9, MYLK (MLCK), FBXL7 and TBL1X, overlapped among both the leading edge and Fisher genes in both datasets (forming four collections of genes among them). Among these 9 of the genes, consisting of POLR1D, ID4, HLA-DOA, DPYSL3, ANXA4 CXCL9, MYCK (MLCK), FBXL7, TBL1X are directly or indirectly related with cancers such as breast, endometrial, brain, colon, ovarian, and B-cell cancers [37, 39, 40, 44–48]. Additional file: 1 Table S1 shows the full name of each gene and the related cancer. In particular ID4 was confirmed an inhibitor of BRCA1 in ovarian and breast cancer by Welcsh and King [37]. ANXA4 has been proposed to be related to chemotherapy-resistant clear cell ovarian tumors by Kim et al. [38], and also found in clear cells of ovarian and endometrial cancer by Zorn et al. [44]. Table 2 shows the information on top genes which are strongly related with breast cancer and Table 3 shows the same for ovarian cancer.

In addition, 6 genes consisting of CCNB2, CCNE2, PTTG1, MAD2L1 BUB1B, E2F1, are all found in the cell cycle pathway, which was found to be enriched in metastatic tissues (see next section). In the case of ovarian cancer, EDAR and CXCL9 are found in the cytokine-cytokine receptor interaction pathway, while ID4 and BMPR2 are found in the TGF signaling pathway, and TBL1X is found in the Wnt signaling pathway. In addition, four of the ovarian genes (DPYSL3, ANXA4, MYLK, and FBXL7) were in ovarian cancer module [28], one of our 5 curated sets of ovarian cancer-related genes. All pathway information for each gene is provided in Additional file 1: Table S1.

Enriched pathways in ovarian survival and breast cancer metastasis data

We began with the top 20 discovered common ovarian cancer pathways between the BI and UNC data sets, forming the intersection of the top 40 in each (originally selected as half up-regulated and half down-regulated). These pathways were selected from the collection of all KEGG pathways, together with the above-mentioned 5 ovarian-related gene sets which we had curated independently of these data (see Methods). We also obtained the 12 common pathways (again out of 40 each) from the two breast cancer metastasis cohorts, this time selected strictly from KEGG pathways. A number of these common pathways (both from the ovarian and breast cancer datasets) have had independent verification as being cancer-related (Additional file 2: Table S2), primarily in the context of differentiating cancer and normal tissue. Based on their validation, the significance in our differentiating cancer phenotypes (survival and metastasis) is also of interest.

We now mention some previously studied cancer-related common enriched pathways, whose functions are described in Tables 4, 5 and 6. Three pathways, consisting of type 1 diabetes mellitus, cytokine-cytokine receptor interaction and hedgehog signaling, are in common between the ovarian long survival and breast cancer non-metastasis groups. In particular, 8 out of 9 common leading edge genes common to both the ovarian and breast cancer data sets in the type 1 diabetes pathway are in the HLA family of immune system activators.

The set of common (breast and ovarian) leading edge genes in the cytokine- cytokine receptor interaction pathway (upregulated in survival/non-metastasis) consists of four genes, BMPR2, KIT, TNFRST11B, and IL1B, the last three of which are known immune system-related genes. The leading edge genes in this pathway differentiating only the ovarian survival datasets include five members of the chemokine ligand (CXCL) family, including one chemokine receptor, as well as four interleukin (IL) members coding proteins embedded in the cell membrane of immune system cells, including T and natural killer (NK) cells. In the breast cancer metastasis data, the cytokine pathway leading edge genes included eight IL members and 5 tumor necrosis factor receptor superfamily (TNFRS) members which activate immune system cells. The hedgehog signaling pathway is associated with ovarian cancer in that its deregulation is frequently observed in epithelial ovarian tumors [50, 58], though this upregulation is not observed in all cases [59]. It has also been observed to be upregulated in breast cancer [60]. Nevertheless, this pathway’s upregulation in both the breast metastasis/ovarian survival data strongly indicates that the proper normal functioning of the pathway as a growth and development regulator may also be important in prevention of metastasis and growth and thus in patient survival.

The cell adhesion molecules, Wnt signaling, antigen processing and presentation, and TGF beta signaling pathways were enriched pathways in long survival for ovarian cancer. These pathways are consistent with interpretations as tumor suppressor and immune system pathways (see Tables 4 and 5). The Wnt signaling pathway has arms which both promote cell proliferation and apoptosis, and correspondingly is associated with both tumor promotion and tumor suppression [58, 61, 62], though in ovarian cancer its enrichment in long survival time indicates the latter role. In particular its upregulation in ovarian survival indicates a role which in its correlation with higher survival time contrasts with its upregulation in tumors vs. normal tissue. Its key role in ovarian cancer (and in the present cohort as an apparent growth regulator when it is functional) adds to known information on its noted dysregulation in a number of ovarian cancers [58, 61, 63]. Though the latter information is based primarily on comparisons of activations of Wnt in cancer vs. normal tissue, the analysis here differentiates cancer tissues from each other with regard to metastasis.

Among pathways enriched in the metastatic breast cancers, the cell cycle, and biosynthesis of steroids pathways have been observed as overexpressed in tumorigenesis in prior research (see Tables 4 and 6). In contrast, the complement and coagulation cascades, enriched in non-metastatic cancers, is known to protect against tumors by activating the immune system [55].

Additional file 2: Table S2 shows the common enriched KEGG/ovarian pathways for ovarian survival (between BI and UNC) and those for breast cancer metastasis (between Wang and van de Vijver). In the case of ovarian cancer, 7 out of the 20 pathways have been found significant in a previous study (Dressman et al., note * in Additional file 2: Table S2).

We note that women who carry certain high levels of risk factors for breast cancer (e.g. family history) are at least 15 times more likely to develop ovarian cancer than non-carriers [64]. Thus, the three common enriched pathways between ovarian cancer survival and breast cancer non-metastasis present themselves as prospective candidates for further investigation on the relationship between the two diseases.


After the messenger RNA ( mRNA ) is produced through the transcription process just described, the mRNA is processed in the nucleus and then released into the cytosol.

The mRNA is then recognized by the ribosomal subunits present in the cytosol and the message is 'read' by the ribosome to produce a protein. The information for the direction of protein formation is encoded in the sequence of nucleotides that make up the mRNA. Groups of three nucleotides (called codons) are 'read' by the ribosome and lead to the addition of a particular amino acid into the growing polypeptide (protein). The process is depicted schematically in the animation below.

After the protein is formed it acquires its active folded state and is able to perform its functions in the cell. The proper folding, transportation, activity and eventual destruction of proteins are all highly regulated processes.

The genes that control these processes are often damaged and not functioning properly in cancer cells.

More information on this topic may be found in Chapter 1 of The Biology of Cancer by Robert A. Weinberg.


  1. Rafael

    Thank you for the site, a very useful resource, I really like

  2. Dim

    You are absolutely right. In it something is also to me this idea is pleasant, I completely with you agree.

  3. Zulurn

    if blown away by the wind?

  4. Ephraim

    Are you kidding!

Write a message