biomakers
biomakers
Re vie w
For re p rint ord e rs, p le a se c o nta c t re p rints@future -d rug s.c om
Cancer diagnosis using
proteomic patterns
Thomas P Conrads, Ming Zhou, Emmanuel F Petricoin III, Lance Liotta
†
and Timothy D Veenstra
The a d ve nt of p rote om ic s ha s b roug ht with it the hop e of d isc ove ring nove l b iom a rke rs tha t
c a n b e use d to d ia g nose d ise a se s, p re d ic t susc e p tib ility a nd m onitor p rog re ssion. Muc h of
this e ffort ha s foc use d up on the m a ss sp e c tra l id e ntific a tion of the thousa nd s of p rote ins
tha t p op ula te c om p le x b iosyste m s suc h a s se rum a nd tissue s. A re volutiona ry a p p roa c h in
p rote om ic p a tte rn a na lysis ha s e m e rg e d a s a n e ffe c tive m e thod for the e a rly d ia g nosis of
d ise a se s suc h a s ova ria n c a nc e r. Prote om ic p a tte rn a na lysis re lie s on the p a tte rn of
p rote ins ob se rve d a nd d oe s not re ly on the id e ntific a tion of a tra c e a b le b iom a rke r.
Hund re d s of c linic a l sa m p le s p e r d a y c a n b e a na lyze d utilizing this te c hnology, whic h ha s
the p ote ntia l to b e a nove l, hig hly se nsitive d ia g nostic tool for the e a rly d e te c tion of c a nc e r.
CONTENTS
Prote omic s a s a
d ia g nostic tool
Prote omic p a tte rn
te c hnology
Expert Rev. Mol. Diagn. 3(4), 411–420 (2003)
Sum m a ry & c onc lusions
Exp e rt opinion
Five -ye a r vie w
Ke y issue s
The term proteomics means different things to gel compared with the other are excised and
different people. Originally, proteomics was the differentially expressed proteins identified
coined to describe the large-scale, high-through- using mass spectrometry (MS). In addition,
put separation and subsequent identification studies such as phosphorylation mapping
of proteins resolved by 2D polyacrimide gel would be incomplete without identifying the
electrophoresis (2DE) [1]. The field of proteom- modified protein, or even better, the specific
ics has since evolved to include almost any type amino acid that has been modified.
of technology that focuses upon the wide-scale
Re fe re nc e s
Affilia tions
analysis of proteins. These technologies range Prote om ic s a s a d ia g nostic tool
from those designed to study a single protein While having an enormous impact in almost
(i.e., mapping of sites of post-translational mod- every discipline of biomedical science, one of
ifications [PTM]) to those for the analysis of the major focuses is to use the high-throughput
hundreds to thousands of proteins in a single capabilities of proteomics in the discovery of
experiment (e.g., protein arrays or isotope-coded novel disease biomarkers [7]. While a biomarker
affinity tags) [2–4]. In essence, the term proteom- can be defined as any laboratory measurement
ics has replaced the use of the phrase protein sci- or physical sign used as a substitute for a clini-
ence. Regardless of the terminology or the scope cally meaningful end point that measures
of the analysis, one of the common requirements directly how a patient feels, functions or sur-
of a vast majority of proteomic studies has been vives [8], as applied to proteomics, a biomarker
†Author for correspondence
Biomedical Proteomics Program,
SAIC-Frederick, Inc.,
the identification of the protein(s) of interest.
is an identified protein(s) that is unique to a
Protein identification is central to most pro- particular disease state. Simply put, the experi-
teomic studies [5]. For example, probably the mental design of a diagnostic proteomic investi-
most well known and widely used proteomic gation aims to scrutinize clinical samples from
technology is the characterization of changes healthy and afflicted individuals in a high-
in protein expression between two different throughput manner, allowing for the relative
samples through comparative 2DE [6]. In such abundance of thousands of proteins from the
studies, two proteomic samples are resolved two histologically distinct samples to be visual-
and visualized on two separate 2DE gels. Pro- ized. Proteins that are found to be differentially
tein spots that are more or less intense on one abundant between the samples are then selected
National Cancer Institute
at Frederick,PO Box B,
Frederick, MD 21702, USA
Tel.: +1 301 846 7286
Fax: +1 301 846 6037
veenstra@ncifcrf.gov
KEYWORDS:
bioinformatics, biomarkers,
cancer, early diagnosis, mass
spectrometry, proteomic patterns
411
Conra d s, Zhou, Pe tric oin, Liotta & Ve e nstra
for identification with the hope that knowledge of their identity disease-stricken individuals. The comparison of two distinct
will provide the basis for defining a diagnostic biomarker. Unfor- serum samples is incredibly laborious using conventional pro-
tunately, this strategy suffers from issues that are in many ways teomic technologies and the comparison of the hundreds if not
technically overwhelming. Firstly, the proteins being observed in thousands needed to validate a biomarker is not routinely possi-
these analyses are generally of high abundance. Therefore, valua- ble. More to the point, in the comparison of just two serum
ble biomarkers expressed at low abundance remain undetected samples, a multitude of changes in protein abundance are
until current analytical technologies become more sensitive. Sec- observed due simply to differences such as age, gender or life-
ondly, discovery of effective biomarkers requires the analysis of style, making the assumption that a particular difference is a
hundreds of histologically well-defined samples retrieved from result of a specific disease state tenuous at best.
healthy and disease-afflicted individuals. In addition, to be clini-
cally relevant, the biomarker should be present in easily obtaina- Prote om ic p a tte rn te c hnology
ble samples such as serum, plasma or urine. Even ignoring the A revolutionary proteomic technology has recently been devel-
difficulties in analyzing serum and plasma via MS-based pro- oped that uses the pattern of proteins observed within a clinical
teomic methods, the natural variability in biofluids obtained sample as a diagnostic fingerprint and does not rely on the
from different patients makes the recognition of a single, consist- identification of the proteins detected. The technology to
ent biomarker in the background of a dynamic proteome acquire these so-called proteomic patterns is quite simple, as
extremely challenging. Thirdly, it may be that the presence of a illustrated in FIGURE 1. In its current state, surface-enhanced
single, definitive biomarker for a particular histological condi- laser desorption/ionization time-of-flight mass spectrometry
tion, such as human chorionic gonadotropin for pregnancy, is (SELDI-TOF MS) is the technology used to acquire the pro-
the exception rather than the rule [9,10]. Indeed, many clinical teomic patterns to be used in the diagnostic setting [15,16]. The
tests that rely on single diagnostic biomarkers, such as cancer- principle of SELDI-TOF is very simple; proteins of interest are
antigen (CA)-125 for ovarian cancer [11] and prostate specific captured, by adsorption, partition, electrostatic interaction or
antigen (PSA) for prostate cancer [12], possess positive predictive affinity chromatography on a stationary-phase and immobi-
values (PPV) that are generally quite low.
lized in an array format on a chip surface. One of the benefits
of this process is that raw biofluids, such as urine, serum and
plasma, can be directly applied to the array surface. After a
Biom a rke r d isc ove ry without p rote in id e ntific a tion
There exists a sobering reality of the lack of success in the dis- series of binding and washing steps, a matrix is applied to the
covery of novel diagnostic biomarkers despite the considerable array surfaces. The species bound to these surfaces can be ion-
intellectual and financial resources currently invested in the use ized by matrix-assisted laser desorption/ionization (MALDI)
of conventional proteomic technologies. It is likely that a vast and their mass-to-charge (m/z) ratios measured by TOF MS.
majority of disease states are not the result of a single recogniza- The result is simply a mass spectrum of the species that bound
ble change in the abundance or function of a protein. Consid- to and subsequently desorbed from the array surface. While the
ering the complexity of an individual cell and the aberrations inherent simplicity of the technology has contributed to the
caused by such disease states as cancer, a vast number of differ- enthusiasm generated for this approach, the implementation of
ences between the protein character of healthy and diseased tis- sophisticated bioinformatic tools have enabled the use of
sues should be observable. Why then has the discovery of dis- SELDI-TOF MS as a potentially revolutionary diagnostic tool.
ease-specific biomarkers been so elusive? Obviously one of the
main reasons is that for a diagnostic marker to be clinically rele- Ap p lic a tion of p rote om ic p a tte rns for d ise a se d ia g nosis
vant it should be assayed from a sample that can be relatively The seminal study describing the use proteomic patterns to
noninvasively obtained in sufficient quantities from patients. diagnose ovarian cancer was published in The Lancet by Petri-
For this reason, the search for biomarkers using proteomic coin and coworkers of the US Food and Drug Administration
methods largely focus upon plasma and serum. While serum (FDA) and Liotta and coworkers of the National Cancer Insti-
constantly perfuses tissues, hence potentially endowing an tute (NCI) [17]. The aggressive nature of ovarian cancer, the
archive of histological information, this information is com- fifth most common cause of cancer-related death in women,
prised not only of the expected circulatory proteins in serum, makes it a prime example of a disease whose 5-year survival rate
such as immunoglobulins, but also of peptides and proteins would dramatically increase if a more effective means of early
that are secreted into the blood and species shed from diseased, (or Stage I) detection could be discovered [17,18]. Unfortunately,
dying or dead cells present throughout the body [13]. Therefore, almost 80% of women with common epithelial ovarian cancer
the background matrix of biofluids such as serum represents a are not diagnosed until the disease has spread to the upper
complex milieu in which to find unique disease-specific abdomen (Stage III) or beyond (Stage IV) [19,20]. The 5-year
biomarkers that are most likely of extremely low abundance. survival rate for these women is only 15–20%, whereas the
The intrinsic person-to-person variability of the content of 5-year survival rate for ovarian cancer at Stage I approaches
biofluids also hampers the identification of a disease-specific 95% with surgical intervention. In this study, the proteomic
biomarker [14]. The identification of a biomarker relies on the patterns of serum samples from several patients with ovarian
comparison of, for example, serum samples from healthy and cancer were compared with those from control patients. Visual
4
12
Expert Rev. Mol. Diagn. 3(4), (2003)
Ca nc e r d ia g nosis using p rote om ic p a tte rns
Serum
Proteins
Protein chip
SELDI-TOF MS
Diagnosis
m/z
Proteomic pattern
Pattern recognition
Figure 1. Disease diagnostics using proteomic patterns. The sample drawn from the patient is applied to a protein chip which is made up of a specific
chromatographic surface. After several washing steps and the application of an energy-absorbing molecule, the species that are retained on the surface of
the chip are analyzed via mass spectrometry. The pattern of peaks within the spectrum is analyzed using sophisticated bioinformatic software to diagnose
the source of the biological sample.
m/z: Mass to charge ratio; SELDI-TOF MS: Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry.
inspection of the mass spectra did not reveal mass spectral fea- y-axis amplitudes of the candidate set of the key m/z values
tures unique to the ovarian cancer serum samples. The ability in N-dimensional space, where N is equal to the number of m/z
to discriminate patterns generated from serum acquired from values found within the training set of spectra. The pattern
healthy individuals and those patients affected with ovarian formed by the relative amplitudes of the spectrum data for this
cancer was only accomplished through application of an artifi- set of chosen m/z values was rated for its ability to distinguish
cial intelligence program able to decipher diagnostic patterns the serum mass spectra acquired from the healthy and cancer-
within the profiles.
affected individuals. Since the aim was to identify the pattern
The artificial intelligence program used in this study com- that provides the optimal segregation, the frequency values
bined elements of a genetic algorithm with cluster analysis within the highest rated sets were reshuffled to form new subset
[21–23]. The input files for the analysis were comprised of the candidates and the resultant defined amplitude values were rated
m/z values on the x-axis along with their corresponding ampli- iteratively until the set that fully discriminates the preliminary
tudes on the y-axis. The analysis was divided into two phases, a sample sets was revealed.
pattern discovery phase and a pattern matching phase, as illus-
Once the algorithm recognizes key m/z values, the model was
trated in FIGURE 2. In the pattern discovery phase, a set of mass tested using a set of masked test spectra in which the optimal
spectra of serum from both healthy and ovarian cancer-affected pattern recognized in the first phase is tested for its diagnostic
individuals (referred to as the training set) was analyzed to iden- capabilities. As opposed to the pattern discovery phase which
tify a subset of m/z values and their related amplitudes that are uses all of the m/z values within the entire spectral data set, in
able to completely segregate the data acquired using serum the pattern matching phase, only the key subset of the m/z val-
samples from patients with ovarian cancer and unaffected indi- ues identified in the pattern discovery phase was used to classify
viduals. In the pattern discovery phase, the source of the serum the unknown samples as being from healthy or cancer-affected
(from healthy or ovarian cancer-affected individuals) was individuals. The pattern formed by the relative amplitudes of
known and is included as part of the data that is provided to the key m/z values in each unknown was then matched to the
the algorithm. The bioinformatic searching process began with optimum pattern defined in the pattern-matching phase. Each
hundreds of arbitrary choices of small sets (5–20) of the unknown sample was classified based upon the cluster(s) that
exact m/z values selected along the x-axis of the mass spectra. its feature set populates as an unaffected or cancer-affected
The diagnostic pattern was formed by plotting the combined individual, or generated a new cluster if it is found not to
413
Conra d s, Zhou, Pe tric oin, Liotta & Ve e nstra
Phase I: training set input
Phase II: sample for diagnosis
a. Unaffected samples
b. Cancer samples
Test/validation sample
for diagnosis
1
000 2000 3000 4000 5000 6000
1000 2000 3000 4000 5000 6000
m/z
m/z
Genetic algorithm + self-organizing
cluster analysis
Lead diagnostic fingerprint
(from training set)
Normal
Cancer
New
'
Survival of the fittest' discriminatory
patterns that discriminate unaffected
from cancer-affected samples
Figure 2. Bioinformatic analysis of proteomic pattern spectra for the determination of the discriminatory patterns in the training and
diagnostic (testing and blind validation) phase.
m/z: Mass to charge ratio.
match any of the patterns defined in the pattern discovery accuracy combined with the attributes of the technology (e.g.,
phase. If the sample generates a new cluster then the point in ease of sample preparation and high-throughput) make pro-
N-space of the unknown sample is outside the defined likeness teomic patterns a potentially invaluable screening tool for high-
boundaries of the ovarian cancer and unaffected clusters.
risk populations. Even with this high overall PPV, however, the
After generating the diagnostic model, the diagnostic fea- technology in its present form is not useful as a clinical screen-
ture sets defined in training were utilized in a series of test ing tool for a disease with a low prevalence such as ovarian can-
samples in which the source of the serum was blinded. The cer. While a PPV of 94% as was reported in the ovarian cancer
diagnostic feature set defined in training was able to correctly study by Petricoin and colleagues is extremely high, the specifi-
diagnose the samples as being acquired from either control city (95%) of the assay when extrapolated over a large popula-
patients or those suffering from ovarian cancer with a sensitiv- tion in which very few patients would actually have ovarian
ity of 100% and a specificity of 95%, with an overall PPV of cancer would result in six out of every 100 patients being sent
9
4% [17]. The success in correctly diagnosing Stage I ovarian for unnecessary biopsies. This percentage of false-positives
cancer suggested that proteomic patterns generated from would have a tremendous negative impact on the available
biofluids may provide a useful indicator of the early onset of a medical resources. To serve as an effective screening tool, a
particular disease state.
diagnostic assay screening for ovarian cancer requires a specifi-
city of at least 99.6% [29]. Therefore, while proteomic pattern
analysis in its present state represents a useful tool to diagnose
Im p rove m e nts in instrum e nta l a na lysis
Since this original study, several laboratories have acquired and cancer, its use as a screening tool for high-risk populations is
analyzed serum proteomic pattern analysis for the diagnosis of still limited.
breast [24] and prostate [25–27] cancer. All of these studies used the
One of the limiting factors in increasing the PPV attributes
same analytical platform combined with different sample prepa- of proteomic pattern analysis is the PBS-II, which is a simple
ration methods and bioinformatic algorithms. The analytical TOF-MS that is designed to provide for a broad m/z range of
platform used is the ProteinChip® Biomarker System-II detection, however, necessarily at the expense of resolution. To
(PBS-II; Ciphergen, CA, USA), a low-resolution TOF MS. assess the increase in diagnostic sensitivity and specificity that
Reflective of the success of the ovarian cancer study [17], many of would be afforded from higher resolution mass spectra, our lab-
these studies have been able to correctly diagnose serum samples oratory performed a side-by-side comparison of the results
with sensitivities and specificities greater than 90%. The diagnostic obtained analyzing serum samples on a PBS-II and a hybrid
4
14
Expert Rev. Mol. Diagn. 3(4), (2003)
Ca nc e r d ia g nosis using p rote om ic p a tte rns
quadrupole (Qq) TOF MS fitted with a SELDI ion source [30].
To compare the PPV of the results obtained from the two
A schematic comparison of the two instruments is provided in MS platforms, 248 serum samples from either healthy women
FIGURE 3. The QqTOF can be regarded as a triple Qq MS in or those clinically diagnosed with various stages of ovarian can-
which the third quadrupole has been replaced with a reflecting cer were provided from the National Ovarian Cancer Early
TOF analyzer. This instrument combines the benefits of ion Detection Program at Northwestern University Hospital (Chi-
selectivity and tandem MS capabilities of a triple Qq MS with cago, Illinois, USA), and processed and analyzed by both
the high mass accuracy and resolution afforded by a reflecting instruments. The key to this study is that the identical set of
TOF analyzer. The PBS-II, on the other hand, is a relatively samples were analyzed on the exact same protein chip surface
simple linear TOF MS. The mass analyzer, though relatively and all experimental variability outside the use of two different
sensitive, provides only low resolution and hence lowmass accu- instruments was thereby eliminated.
racy data. While time lag focusing is used to increase data reso-
The mass spectra acquired on both the PBS-II and QqTOF
lution and mass accuracy, the achievable mass accuracy is much MS instruments were analyzed using the ProteomeQuest™
less than that afforded using more conventional, high-resolu- (Correlogic Systems, Inc., MD, USA) bioinformatics tool as
tion TOF MS instrumentation, such as the QqTOF. An exam- illustrated in FIGURE 2. A total of 28 serum samples from unaf-
ple of a serum sample analyzed using both the PBS-II TOF MS fected women and 49 women with ovarian cancer were used for
and the QqTOF MS is illustrated in FIGURE 4. While the spectra the training set. A series of diagnostic models were generated
are qualitatively similar, the resolution obtainable with the using a variety of different combinations of bioinformatic heu-
QqTOF MS is on the order of 60-fold higher than that ristic parameters. None of these parameters had any affect on
obtainable with the PBS-II TOF MS.
the raw MS data, they were simply related to the bioinformatic
Vacuum pumps
A
Accelerator
Ultrastable quadrupole
mass filter
Detector
Sample ions
High efficieny collision cell
Q0
Q1
Q2
2
0
1
2.5 Torr
-2
10
Torr
Effective flight
path = 2.5 m
-2
10
Torr
Field free
drift region
-
7
7
x10 Torr
Ion mirror
B
Laser
Proteins
+
+
+
+
+
+
Y
+
+
+
---- Vacuum ----
+
-
Accelerating potential
Figure 3. Schematic diagrams comparing the configuration of (A) QqTOF MS and (B) PBS-II TOF MS. The QqTOF combines elements of a triple
quadrupole (Qq) MS with a reflecting TOF analyzer, thereby affording higher mass accuracy and resolution than the simple linear TOF of the PBS-II.
®
PBS-II: ProteinChip Biomarker System-II (Ciphergen, CA, USA); TOF MS: Time-of-flight mass spectrometry.
www.future-drugs.com
415
Conra d s, Zhou, Pe tric oin, Liotta & Ve e nstra
A
100
7771.4
7
5
0
5
7923.0
3885.3
5
2
4072.8
8149.8
8945.7
4471.9
10276
0
2000
4000
6000
m/z
8000
10000
12000
B
4
3
2
1
00
00
00
00
0
7
766.02
3
883.57
3977.76
7955.38
4071.81
8
142.98
333.59
8
602.83
7193.78
4467.22
8
8933.46
2000
4000
6000
m/z
Figure 4. Comparison of mass spectra acquired using a SELDI-TOF MS and a QqTOF MS equipped with a SELDI source.
8000
10000
12000
m/z: Mass to charge ratio; SELDI: Surface-enhanced laser desorption/ionization; TOF MS: Time-of-flight mass spectrometry; Qq: Quadrupole.
process of generating diagnostic models from the raw data
Four models were found that were both 100% sensitive and
and included such factors as the similarity space of likeness specific in their ability to correctly discriminate serum samples
for cluster classification, the feature set size of random m/z taken from unaffected women or those suffering from ovarian
values whose combined intensities comprise each pattern and cancer. All of these models were generated from data acquired
the learning rate of pattern generation by the genetic algo- using the QqTOF MS as no models generated from the PBS-II
rithm. A total of 108 models were derived and queried with were both 100% sensitive and specific. Just as importantly, and
the same set of proteomic pattern spectra in the testing and key if this technology is to become a viable screening tool, no
blind validation phase to assess their sensitivity, specificity false-positive or false-negative classifications occurred using
and overall PPV.
these models, giving each a PPV of 100% using the patient
The models derived from the training sets acquired on cohort employed in this study.
both the PBS-II and QqTOF MS were tested using blind
Another key aspect to this study is that the key m/z features
serum sample mass spectra obtained from 31 unaffected that comprise the four diagnostic models that had 100% PPV
women and 63 women with ovarian cancer, and further vali- for ovarian cancer revealed certain consistent features. Although
dated using blind serum sample spectra obtained from 37 the proteomic patterns generated from both healthy and cancer-
unaffected women and 40 women with ovarian cancer. The affected patients using the QqTOF MS are quite similar
diagnostic models generated from mass spectra acquired (FIGURE 5), careful inspection of the raw mass spectra reveals that
using the higher resolution QqTOF MS were statistically peaks at m/z values 7060.121 and 8605.678 are indeed differen-
superior not only in testing (sensitivity, p < 0.0001; specifi- tially abundant in a selection of the serum samples obtained
2
-19
-9
city, p < 3 x 10 ) but also in validation (sensitivity, p < 9 x 10 ; from ovarian cancer patients as compared with unaffected indi-
specificity, p < 6 x 10 ) as evaluated using a two-tailed viduals (FIGURE 5, INSETS). The results indicated that these MS
2
2
-
6
2
Cochrane–Armitage test for trend [31].
peaks originate from species that may be consistent indicators of
4
16
Expert Rev. Mol. Diagn. 3(4), (2003)
Ca nc e r d ia g nosis using p rote om ic p a tte rns
the presence of ovarian cancer and represent good candidates for employed in this study, is preferred based upon the present
ongoing efforts to identify low molecular weight components in results that serve as a platform for clinical trials of serum
serum that may be key disease progression indicators.
proteomic patterns.
While a number of studies have reported impressive diag-
nostic success using the lower resolution PBS-II TOF MS to Sum m a ry & c onc lusions
screen for diseases of relatively low prevalence such as ovarian One of the overlooked powers of investigating proteomic pat-
cancer, a minimum level of 99.6% sensitivity and specificity is terns is the ability to screen hundreds of serum samples in a
required [29]. In blinded testing and validation studies, any high-throughput manner and therefore quickly determine tar-
one of the four best models generated using QqTOF MS data gets (key m/z values) for further investigation. The inherent
were able to correctly classify 22 of 22 women with Stage I variability of serum between individuals makes it impossible
ovarian cancer, 81 of 81 women with Stage II, III and IV to compare and recognize valid disease indicators using the
ovarian cancer and 68 of 68 benign disease controls. It can be conventional proteomic techniques of protein separation
envisioned in the near future that a clinical test would simul- (2DE or multidimensional liquid chromatographic fractiona-
taneously employ several combinations of highly accurate tion) and MS identification. The technology used to generate
diagnostic proteomic patterns which, if taken together, could proteomic patterns is highly automated and even an academic
achieve an even higher degree of accuracy in a screening set- laboratory can analyze in excess of 300 samples per day. This
ting where a diagnostic test will face large population hetero- throughput allows for key discriminatory features to be dis-
geneity and potential variability in sample quality and handling. tinguished within hundreds of serum or plasma samples over
Hence, a high-resolution system, such as the QqTOF MS a statistically relevant population in a rapid fashion. It must
4
5
Unaffected
7060
8606
100
0
6
0
8500
900
7000
7100
7200
8600
m/z
8700
m/z
50
0
2
000
4000
6000
8000
5
10000
12000
m/z
Ovarian
cancer
4
7060
8606
100
0
0
6900
7000
7100
7200
8500
8600
m/z
8700
m/z
50
0
2000
4000
6000
8000
10000
12000
m/z
Figure 5. Comparison of SELDI QqTOF mass spectra of serum from an unaffected individual and an ovarian cancer patient. Insets show expanded
m/z regions highlighting significant intensity differences of the peaks in the m/z 7060.121 and 8605.678 identified by the algorithm as belonging to the
optimum discriminatory pattern.
m/z: Mass to charge ratio; SELDI QuTOF: Surface-enhanced laser desorption/ionization hybrid quadrupole time-of-flight.
www.future-drugs.com
417
Conra d s, Zhou, Pe tric oin, Liotta & Ve e nstra
be reiterated that the ability to distinguish sera from an Exp e rt op inion
unaffected individual or an individual with ovarian cancer Disease diagnostics using proteomic patterns has rapidly
based upon a single serum proteomic m/z feature alone is not emerged as a potentially revolutionary tool to detect and monitor
possible across the entire serum study set. Accurate histological disease progression or therapeutic response. Its emergence is
distinction is only possible when the key m/z features and somewhat analogous to molecular fingerprinting in which the
their intensities are considered en masse.
DNA patterns obtained from different tumors have been dem-
A limitation of individual cancer biomarkers is the lack of onstrated to be unique for each cancer [35]. In molecular finger-
sensitivity and specificity when applied to large heterogeneous printing, the hope is that, in the future, physicians may be able
populations [29,32]. Biomarker pattern analysis is an emerging to use this information to design treatment programs specific
technology aimed at overcoming this limitation. While serum for each type of tumor. While the development of molecular
proteomic pattern analysis has the potential to provide new fingerprinting has followed the progression of genomic analysis,
tools for early diagnosis, therapeutic monitoring and outcome proteomic pattern analysis, however, represents a complete
analysis, the success of this method will depend upon the abil- about-face in proteomic analysis. While the trend in proteomic
ity of a selected set of features to transcend the biologic hetero- technology has been to identify and characterize an increasing
geneity and methodological background noise. Using clinical number of proteins from a particular clinical sample in order to
study sets, progress has been made toward this diagnostic goal find a disease-specific biomarker, proteomic patterns rely simply
by employing a genetic algorithm coupled with a self-organiz- on a crude proteomic survey that provides all of the necessary
ing cluster analysis to discover diagnostic subsets of m/z features diagnostic information. While the potential is great, much still
and their relative intensities contained within high resolution needs to be learned. The concept of using a proteomic pattern
mass spectral data. One of the consistencies within many of the as a diagnostic tool is in its infancy, therefore every step in this
diagnostic proteomic patterns is that a majority of the key m/z analytical process requires optimization. This optimization
values are of low molecular weight, typically less than 10 kDa. process will include such aspects as sample acquisition and
The low molecular weight serum proteome is an unexplored processing, in addition to pattern acquisition and data analysis.
archive, even though this is the mass region where MS is best Since the diagnostic power of proteomic patterns relies heavily
suited for analysis. It is likely that disease-associated species are upon the use of bioinformatics, it is important to discover the
comprised of low molecular weight peptide/protein species that biological basis behind the mathematical solution. While the
vary in mass by as little as a few Daltons. Thus, bioinformatic identification of key peaks that are distinguished by the bioin-
analysis of higher resolution MS data would be expected to formatic analysis may not provide any clues as to the manifestation
discover patterns not discernable within lower resolution MS data. or progression of the disease, the hope is they can at least validate
One major criticism of the use of proteomic patterns for diag- the results being provided. While many critics still abound, one
nostic purposes is that the identity of the proteins or peptides simple fact cannot be ignored: the diagnostic models generated
giving rise to the key m/z features is not known [33,34]. At this from proteomic patterns continue to provide highly sensitive
point, it is debatable as to whether it is worth the effort to iden- and specific results in testing and blind validation studies, even as
tify these features as they may provide little aid in developing an the number of samples being analyzed continues to increase.
alternative diagnostic platform. For example, many of the key
features within the proteomic patterns that account for the diag- Five -ye a r vie w
nostic predictability are of low m/z (<10,000 Da) and therefore The next 5 years will be critical in the validation of the use of
it is likely that these could be from fragment species generated proteomic patterns in disease detection. Currently, the infor-
from larger proteins that are proteolyzed either within the circu- mation present in proteomic patterns may provide an extremely
latory system or in the tumor/host microenvironment. It would powerful complementary tool to assist physicians in disease
be extremely challenging to generate an affinity reagent with diagnosis. The impact of proteomic patterns in disease diagno-
specificity to a peptide fragment without considerable cross- sis, however, has the potential to be even greater than a comple-
reactivity to its parent protein. In addition, there are many tools mentary tool. While at this point it is not clear as to whether
in medicine today, such as the electrocardiogram, with which proteomic patterns reveal interindividual differences within the
the physician relies solely on a pattern to base his/her diagnosis. same type of cancer, there is an interest in using proteomic pat-
Even the identification of a specific biomarker may not provide terns to recognize the best treatment for each afflicted individ-
any direct insight into how a disease may arise or progress. For ual. While not fully developed, there is an active interest in
instance, while PSA is used to indicate the possible presence of a determining if proteomic patterns can be used to predict a
prostatic tumor, its role in cancer development remains unclear. patient ’s response to a specific therapy. By combining informa-
Conversely, there is also the likelihood that these key features tion from proteomic patterns with that obtained from molecu-
may represent proteins that provide exciting insights into the lar fingerprints or a histopathological assessment, the optimal
manifestation and progression of cancer. Therefore, identifying treatment for the individual may be more obvious than a sim-
these features is most likely a worthwhile effort although the ple trial and error regimen. The NCI has invested in a program
advancement of disease diagnostics using proteomic patterns to garner FDA approval for the use of proteomic patterns in the
should not be hindered by this exercise.
diagnosis of ovarian cancer in high-risk populations. In addition,
4
18
Expert Rev. Mol. Diagn. 3(4), (2003)
Ca nc e r d ia g nosis using p rote om ic p a tte rns
the two largest clinical diagnostic laboratories in the USA,
Quest Diagnostics (NJ, USA) and Laboratory Corporation of
America (NC, USA), have signed licensing agreements to
develop and market the ovarian cancer protein pattern blood test.
As with any emerging technology, the niche that proteomics will
fill within the field of diagnostic medicine remains to be seen.
The most obvious benefit of defined proteomic pattern diagnos-
tic features can provide is in population screening to detect dis-
eases such as cancer at earlier stages to enable more effective med-
ical intervention. The simplicity of the test makes it feasible to
screen high-risk populations for a variety of different cancers.
The utility of proteomic patterns will be highly dependent upon
the level of their inherent sensitivity and specificity. If the sensi-
tivity and specificity can approach 100%, disease diagnosis using
proteomic patterns will revolutionize diagnostic medicine as it
can be used reliably for the early detection of low prevalence can-
cers. The detection of cancers at the earliest possible stage will
save countless lives and help to meet the goal of the NCI to alle-
viate the pain and suffering of cancer by the year 2015. Even if
this level of sensitivity and specificity is not achieved, proteomic
patterns will still provide an invaluable complement to determine
the need for a patient biopsy or response to therapy.
Key issues
•
•
There is an urgent need for cancer biomarkers with
more accurate diagnostic capability, particularly for
early-stage disease.
Conventional proteomic technologies focus upon identifying
disease-specific biomarkers, while proteomic pattern
analysis uses the overall pattern to diagnose disease states
without the need to identify any of the components within
the pattern.
• Disease diagnostics using proteomic patterns is a
revolutionary method to detect early-stage cancer.
•
•
•
Raw biofluids, such as serum and plasma, can be used to
acquire proteomic patterns with a simple time-of-flight
mass spectrometer.
Bioinformatic software is required to decipher the patterns
within the mass spectra that discriminate serum acquired
from healthy and cancer-affected individuals.
Information contained within proteomic patterns has
been demonstrated to detect ovarian, breast and
prostate cancers with sensitivities and specificities greater
than 90%.
Ac knowle d g m e nts
The authors would like to acknowledge Correlogics Systems for
their collaboration on using high resolution mass spectrometry to
diagnose ovarian cancer. This project has been funded in whole
or in part with Federal funds from the National Cancer Institute,
National Institutes of Health, Contract No. NO1-CO-12400.
•
•
The use of a higher resolution mass spectrometer has
demonstrated the potential to provide high enough
sensitivity and specificity to enable the use of proteomic
patterns as a screening tool for low prevalence cancers.
Until further blinded validation studies are performed to
verify the apparent extraordinary sensitivity and specificity
of this approach, the method should be considered
investigational and not yet ready for clinical use.
Disc la im e r
The content of this publication does not necessarily reflect the
views or policies of the Department of Health and Human
Services, nor does mention of trade names, commercial products,
or organizations imply endorsement by the US Government.
Re fe re nc e s
4
Patterson SD, Aebersold RH. Proteomics:
the first decade and beyond. Nature Genet.
10 Srivastava S, Gopal-Srivastava R.
Biomarkers in cancer screening: a public
health perspective. J. Nutr. 132(8 Suppl.),
Papers of special note have been highlighted as:
33(Suppl.), 311–323 (2003).
•
•
of interest
• of considerable interest
2471S–2475S (2002).
5
Schieltz DM, Yates JR 3rd. Direct
identification of proteins in ultracomplex
mixtures. Applications to proteome analysis.
Methods Mol. Biol. 211, 235–245, (2003).
11 Guppy AE, Rustin GJ. CA125 response:
can it replace the traditional response
1
Wilkins MR, Sanchez JC, Gooley AA et al.
Progress with proteome projects: why all
proteins expressed by a genome should be
identified and how to do it. Biotechnol.
Genet. Eng. Rev. 13,19–50 (1996).
criteria in ovarian cancer? Oncologist 7(5),
437–443 (2002).
6
Resing KA. Analysis of signaling pathways
using functional proteomics. Ann. NY Acad. 12 Grossklaus DJ, Smith JA, Shappell SB,
Sci. 971, 608–614, (2002).
Coffey CS, Chang SS, Cookson MS. The
free/total prostate-specific antigen ratio is
the best predictor of tumor involvement in
the radical prostatectomy specimen among
men with an elevated PSA. Urol. Oncol.
2
Phizicky E, Bastiaens PI, Zhu H, Snyder M,
Fields S. Protein analysis on a proteomic
scale. Nature 422(6928), 208–215
7
8
Hanash S. Disease proteomics. Nature
422(6928), 226–232 (2003).
(2003).
Temple RJ. A regulatory authority ’s opinion
about surrogate end points. In: Clinical
Measurement in Drug Evaluation. Nimmo
WS, Tucker GT (Eds). John Wiley and
Sons, Inc., New York, NY, USA (1995).
3
Aebersold R, Mann M. Mass spectrometry-
7(5), 195–198 (2002).
based proteomics. Nature 422(6928),
1
3
Adkins JN, Varnum SM, Auberry KJ et al.
Toward a human blood serum proteome:
analysis by multidimensional separation
coupled with mass spectrometry. Mol.
Cell. Proteomics 1(12), 947–955
198–207 (2003).
•
Describes the current state-of-the-art
technology in the characterization of
proteomes using methods to identify
thousands of proteins in clinical samples.
9
Rippey JH. Pregnancy tests: evaluation of
current status. Crit. Rev. Clin. Lab. Sci.
19(4), 353–359 (1984).
(2002).
www.future-drugs.com
419
Conra d s, Zhou, Pe tric oin, Liotta & Ve e nstra
1
4
Pieper R, Su Q, Gatlin CL, Huang ST,
Anderson NL, Steiner S. Multi-component
immunoaffinity subtraction
chromatography: an innovative step
towards a comprehensive survey of the
human plasma proteome. Proteomics 3(4),
24 Kohonen T. The self-organizing map. Proc.
Inst. Electrical Electronics Eng. 78,
1464–1480 (1990).
Affilia tions
ThomasP Conrads, PhD
•
•
•
Associate Director
Mass Spectrometry Center,
Biomedical Proteomics Program,
SAIC-Frederick, Inc.,
National Cancer Institute at Frederick,
PO Box B,
Frederick, MD 21702, USA
Tel.: +1 301 846 7353
Fax: +1 301 846 6037
conrads@ncifcrf.gov
2
5
Li J, Zhang Z, Rosenzweig J, Wang YY,
Chan DW. Proteomics and bioinformatics
approaches for identification of serum
biomarkers to detect breast cancer. Clin.
Chem. 48(8), 1296–1304 (2002).
422–432 (2003).
15
Hutchens TW, Yip TT. New desorption
strategies for the mass spectrometric analysis
of macromolecules. Rapid. Commun. Mass
Spectrom. 7, 576–580 (1993).
2
6
Qu Y, Adam BL, Yasui Y et al. Boosted
decision tree analysis of surface-enhanced
laser desorption/ionization mass spectral
serum profiles discriminates prostate cancer
from noncancer patients. Clin. Chem.
48(10), 1835–1843 (2002).
16
Issaq HJ, Conrads TP, Prieto DA,
Tirumalai R, Veenstra TD. SELDI-TOF
MS for diagnostic proteomics. Anal. Chem.
MingZhou, PhD
Senior Scientist
Mass Spectrometry Center,
Biomedical Proteomics Program,
SAIC-Frederick, Inc.,
National Cancer Institute at Frederick,
PO Box B, Frederick,
MD 21702, USA
Tel.: +1 301 846 7199
Fax: +1 301 846 6037
mzhou@ncifcrf.gov
75(7), 148A–155A (2003).
2
7
Adam BL, Qu Y, Davis JW et al. Serum
protein fingerprinting coupled with a
pattern-matching algorithm distinguishes
prostate cancer from benign prostate
hyperplasia and healthy men. Cancer Res.
•
Describes details of the procedure behind
the acquisition of proteomic patterns
ranging from the preparation of protein
chips to their analysis by surface-enhanced
laser desorption/ionization time-of-flight
mass spectrometry (SELDI-TOF MS).
62(13), 3609–3614 (2002).
•
2
2
•
8
9
Describes the use of proteomic patterns
to provide a highly accurate method for
the early detection and diagnosis of
prostate cancer.
17
Petricoin EF, Ardekani AM, Hitt BA et al.
Use of proteomic patterns in serum to
identify ovarian cancer. Lancet 359(9306),
Emmanuel F Petricoin III
Co-Director
NCI Clinical Proteomics Program,
Food and Drug Administration-National Cancer
Institute
Clinical Proteomics Program,
Center for Biologics Evaluation and Research,
Food and Drug Administration,
Bethesda, MD 20892, USA
Tel.: +1 301 827 1753
572–577 (2002).
Petricoin EF 3rd, Ornstein DK, Paweletz
CP et al. Serum proteomic patterns for
detection of prostate cancer. J. Natl Cancer
Inst. 94(20), 1576–1578 (2002).
•
•
Describes the use of proteomic patterns
for the diagnosis of ovarian cancer and
provides an in-depth description of the
bioinformatic tools used to classify benign
from malignant conditions.
Kainz C. Early detection and preoperative
diagnosis of ovarian carcinoma. Wien. Med.
Wochenschr. 146(1–2), 2–7 (1996).
1
8
9
Jacobs IJ, Skates SJ, MacDonald N et al.
Screening for ovarian cancer: a pilot
randomised controlled trial. Lancet 353,
Fax: +1 301 480 3256
petricoin@cber.FDA.gov
•
•
Describes the need for high specificity in
the screening of women for ovarian cancer.
1207–1210 (1999).
•
Lance Liotta
3
0
Chernushevich IV, Loboda AV, Thomson
BA. An introduction to quadrupole-time-
of-flight mass spectrometry. J. Mass
Spectrom. 36(8), 849–865 (2001).
Co-Director
1
Cohen LS, Escobar PF, Scharm C,
Glimco B, Fishman DA. Three-
dimensional power Doppler ultrasound
improves the diagnostic accuracy for
ovarian cancer prediction. Gynecol. Oncol.
NCI Clinical Proteomics Program,
Laboratory of Pathology,
National Cancer Institute,
Center for Cancer Research,
Bethesda, MD 20892, USA
Tel.: +1 301 496 2035
Fax: +1 301 480 0853
liottal@nih.gov
3
3
1
2
Agresti A. Categorical Data Analysis. John
Wiley and Sons, New York, NY, USA
82, 40–48 (2001).
(1990).
20
Ozols RF, Rubin SC, Thomas GM,
Robboy SJ. Epithelial ovarian cancer. In:
Principles and Practice of Gynecologic
Oncology. Hoskins WJ, Perez CA, Young
RC (Eds),.Lippincott Williams and
Wilkins, Philadelphia, PA, USA,
Leung GM, Lam T-H, Thach TQ, Hedley
AJ. Will screening mammography in the
east do more harm than good? Am. J. Public
Health 92(11), 1841–1846 (2002).
•
Timothy D Veenstra, PhD
Director
Analytical Chemistry Laboratory,
Biomedical Proteomics Program,
SAIC-Frederick, Inc.,
National Cancer Institute at Frederick,
PO Box B, Frederick,
3
3
Diamandis EP. Proteomic patterns in serum
and identification of ovarian cancer. Lancet
981–1058 (2000).
360(9327), 170 (2002).
2
1
2
Menon U, Jacobs I. Recent developments
in ovarian cancer screening. Curr. Opin.
Obstet. Gynaecol. 12, 39–42 (2000).
•
Points out some of the criticisms of using
proteomic patterns as a diagnosis for
ovarian cancer.
MD 21702, USA
Tel.: +1 301 846 7286
Fax: +1 301 846 6037
veenstra@ncifcrf.gov
2
Holland JH. Adaptation in natural and
artificial systems: an introductory analysis
with applications to biology, control and
artificial intelligence (3rd edition). MIT
Press, Cambridge, MA, USA (1994).
3
3
4
5
Diamandis EP. Serum proteomic patterns
for detection of prostate cancer. J. Natl
Cancer Inst. 95(6), 489–90 (2003).
Perou CM, Sorlie T, Eisen MB et al.
Molecular portraits of human breast
tumours. Nature 406(6797), 747–752
23
Kohonen Y. Self-organizing formation of
topologically correct feature maps. Biol.
Cybern. 43, 59–69 (1982).
(2000).
4
20
Expert Rev. Mol. Diagn. 3(4), (2003)