(Sem Título)
(Sem Título)


Re vie w For re p rint ord e rs, p le a se c o nta c t re p rints@future -d rug s.c om Cancer diagnosis using proteomic patterns Thomas P Conrads, Ming Zhou, Emmanuel F Petricoin III, Lance Liotta †and Timothy D Veenstra The a d ve nt of p rote om ic s ha s b roug ht with it the hop e of d isc ove ring nove l b iom a rke rs tha t c a n b e use d to d ia g nose d ise a se s, p re d ic t susc e p tib ility a nd m onitor p rog re ssion. Muc h of this e ffort ha s foc use d up on the m a ss sp e c tra l id e ntific a tion of the thousa nd s of p rote ins tha t p op ula te c om p le x b iosyste m s suc h a s se rum a nd tissue s. A re volutiona ry a p p roa c h in p rote om ic p a tte rn a na lysis ha s e m e rg e d a s a n e ffe c tive m e thod for the e a rly d ia g nosis of d ise a se s suc h a s ova ria n c a nc e r. Prote om ic p a tte rn a na lysis re lie s on the p a tte rn of p rote ins ob se rve d a nd d oe s not re ly on the id e ntific a tion of a tra c e a b le b iom a rke r. Hund re d s of c linic a l sa m p le s p e r d a y c a n b e a na lyze d utilizing this te c hnology, whic h ha s the p ote ntia l to b e a nove l, hig hly se nsitive d ia g nostic tool for the e a rly d e te c tion of c a nc e r. CONTENTS Prote omic s a s a d ia g nostic tool Prote omic p a tte rn te c hnology Expert Rev. Mol. Diagn. 3(4), 411–420 (2003) Sum m a ry & c onc lusions Exp e rt opinion Five -ye a r vie w Ke y issue s The term proteomics means different things to gel compared with the other are excised and different people. Originally, proteomics was the differentially expressed proteins identified coined to describe the large-scale, high-through- using mass spectrometry (MS). In addition, put separation and subsequent identification studies such as phosphorylation mapping of proteins resolved by 2D polyacrimide gel would be incomplete without identifying the electrophoresis (2DE) [1]. The field of proteom- modified protein, or even better, the specific ics has since evolved to include almost any type amino acid that has been modified. of technology that focuses upon the wide-scale Re fe re nc e s Affilia tions analysis of proteins. These technologies range Prote om ic s a s a d ia g nostic tool from those designed to study a single protein While having an enormous impact in almost (i.e., mapping of sites of post-translational mod- every discipline of biomedical science, one of ifications [PTM]) to those for the analysis of the major focuses is to use the high-throughput hundreds to thousands of proteins in a single capabilities of proteomics in the discovery of experiment (e.g., protein arrays or isotope-coded novel disease biomarkers [7]. While a biomarker affinity tags) [2–4]. In essence, the term proteom- can be defined as any laboratory measurement ics has replaced the use of the phrase protein sci- or physical sign used as a substitute for a clini- ence. Regardless of the terminology or the scope cally meaningful end point that measures of the analysis, one of the common requirements directly how a patient feels, functions or sur- of a vast majority of proteomic studies has been vives [8], as applied to proteomics, a biomarker †Author for correspondence Biomedical Proteomics Program, SAIC-Frederick, Inc., the identification of the protein(s) of interest. is an identified protein(s) that is unique to a Protein identification is central to most pro- particular disease state. Simply put, the experi- teomic studies [5]. For example, probably the mental design of a diagnostic proteomic investi- most well known and widely used proteomic gation aims to scrutinize clinical samples from technology is the characterization of changes healthy and afflicted individuals in a high- in protein expression between two different throughput manner, allowing for the relative samples through comparative 2DE [6]. In such abundance of thousands of proteins from the studies, two proteomic samples are resolved two histologically distinct samples to be visual- and visualized on two separate 2DE gels. Pro- ized. Proteins that are found to be differentially tein spots that are more or less intense on one abundant between the samples are then selected National Cancer Institute at Frederick,PO Box B, Frederick, MD 21702, USA Tel.: +1 301 846 7286 Fax: +1 301 846 6037 veenstra@ncifcrf.gov KEYWORDS: bioinformatics, biomarkers, cancer, early diagnosis, mass spectrometry, proteomic patterns www.future -d rug s.c om 411

Conra d s, Zhou, Pe tric oin, Liotta & Ve e nstra for identification with the hope that knowledge of their identity disease-stricken individuals. The comparison of two distinct will provide the basis for defining a diagnostic biomarker. Unfor- serum samples is incredibly laborious using conventional pro- tunately, this strategy suffers from issues that are in many ways teomic technologies and the comparison of the hundreds if not technically overwhelming. Firstly, the proteins being observed in thousands needed to validate a biomarker is not routinely possi- these analyses are generally of high abundance. Therefore, valua- ble. More to the point, in the comparison of just two serum ble biomarkers expressed at low abundance remain undetected samples, a multitude of changes in protein abundance are until current analytical technologies become more sensitive. Sec- observed due simply to differences such as age, gender or life- ondly, discovery of effective biomarkers requires the analysis of style, making the assumption that a particular difference is a hundreds of histologically well-defined samples retrieved from result of a specific disease state tenuous at best. healthy and disease-afflicted individuals. In addition, to be clini- cally relevant, the biomarker should be present in easily obtaina- Prote om ic p a tte rn te c hnology ble samples such as serum, plasma or urine. Even ignoring the A revolutionary proteomic technology has recently been devel- difficulties in analyzing serum and plasma via MS-based pro- oped that uses the pattern of proteins observed within a clinical teomic methods, the natural variability in biofluids obtained sample as a diagnostic fingerprint and does not rely on the from different patients makes the recognition of a single, consist- identification of the proteins detected. The technology to ent biomarker in the background of a dynamic proteome acquire these so-called proteomic patterns is quite simple, as extremely challenging. Thirdly, it may be that the presence of a illustrated in FIGURE 1. In its current state, surface-enhanced single, definitive biomarker for a particular histological condi- laser desorption/ionization time-of-flight mass spectrometry tion, such as human chorionic gonadotropin for pregnancy, is (SELDI-TOF MS) is the technology used to acquire the pro- the exception rather than the rule [9,10]. Indeed, many clinical teomic patterns to be used in the diagnostic setting [15,16]. The tests that rely on single diagnostic biomarkers, such as cancer- principle of SELDI-TOF is very simple; proteins of interest are antigen (CA)-125 for ovarian cancer [11] and prostate specific captured, by adsorption, partition, electrostatic interaction or antigen (PSA) for prostate cancer [12], possess positive predictive affinity chromatography on a stationary-phase and immobi- values (PPV) that are generally quite low. lized in an array format on a chip surface. One of the benefits of this process is that raw biofluids, such as urine, serum and plasma, can be directly applied to the array surface. After a Biom a rke r d isc ove ry without p rote in id e ntific a tion There exists a sobering reality of the lack of success in the dis- series of binding and washing steps, a matrix is applied to the covery of novel diagnostic biomarkers despite the considerable array surfaces. The species bound to these surfaces can be ion- intellectual and financial resources currently invested in the use ized by matrix-assisted laser desorption/ionization (MALDI) of conventional proteomic technologies. It is likely that a vast and their mass-to-charge (m/z) ratios measured by TOF MS. majority of disease states are not the result of a single recogniza- The result is simply a mass spectrum of the species that bound ble change in the abundance or function of a protein. Consid- to and subsequently desorbed from the array surface. While the ering the complexity of an individual cell and the aberrations inherent simplicity of the technology has contributed to the caused by such disease states as cancer, a vast number of differ- enthusiasm generated for this approach, the implementation of ences between the protein character of healthy and diseased tis- sophisticated bioinformatic tools have enabled the use of sues should be observable. Why then has the discovery of dis- SELDI-TOF MS as a potentially revolutionary diagnostic tool. ease-specific biomarkers been so elusive? Obviously one of the main reasons is that for a diagnostic marker to be clinically rele- Ap p lic a tion of p rote om ic p a tte rns for d ise a se d ia g nosis vant it should be assayed from a sample that can be relatively The seminal study describing the use proteomic patterns to noninvasively obtained in sufficient quantities from patients. diagnose ovarian cancer was published in The Lancet by Petri- For this reason, the search for biomarkers using proteomic coin and coworkers of the US Food and Drug Administration methods largely focus upon plasma and serum. While serum (FDA) and Liotta and coworkers of the National Cancer Insti- constantly perfuses tissues, hence potentially endowing an tute (NCI) [17]. The aggressive nature of ovarian cancer, the archive of histological information, this information is com- fifth most common cause of cancer-related death in women, prised not only of the expected circulatory proteins in serum, makes it a prime example of a disease whose 5-year survival rate such as immunoglobulins, but also of peptides and proteins would dramatically increase if a more effective means of early that are secreted into the blood and species shed from diseased, (or Stage I) detection could be discovered [17,18]. Unfortunately, dying or dead cells present throughout the body [13]. Therefore, almost 80% of women with common epithelial ovarian cancer the background matrix of biofluids such as serum represents a are not diagnosed until the disease has spread to the upper complex milieu in which to find unique disease-specific abdomen (Stage III) or beyond (Stage IV) [19,20]. The 5-year biomarkers that are most likely of extremely low abundance. survival rate for these women is only 15–20%, whereas the The intrinsic person-to-person variability of the content of 5-year survival rate for ovarian cancer at Stage I approaches biofluids also hampers the identification of a disease-specific 95% with surgical intervention. In this study, the proteomic biomarker [14]. The identification of a biomarker relies on the patterns of serum samples from several patients with ovarian comparison of, for example, serum samples from healthy and cancer were compared with those from control patients. Visual 412 Expert Rev. Mol. Diagn. 3(4), (2003)

Ca nc e r d ia g nosis using p rote om ic p a tte rns Serum Proteins Protein chip SELDI-TOF MS Diagnosis m/z Proteomic pattern Pattern recognition Figure 1. Disease diagnostics using proteomic patterns. The sample drawn from the patient is applied to a protein chip which is made up of a specific chromatographic surface. After several washing steps and the application of an energy-absorbing molecule, the species that are retained on the surface of the chip are analyzed via mass spectrometry. The pattern of peaks within the spectrum is analyzed using sophisticated bioinformatic software to diagnose the source of the biological sample. m/z: Mass to charge ratio; SELDI-TOF MS: Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry. inspection of the mass spectra did not reveal mass spectral fea- y-axis amplitudes of the candidate set of the key m/z values tures unique to the ovarian cancer serum samples. The ability in N-dimensional space, where N is equal to the number of m/z to discriminate patterns generated from serum acquired from values found within the training set of spectra. The pattern healthy individuals and those patients affected with ovarian formed by the relative amplitudes of the spectrum data for this cancer was only accomplished through application of an artifi- set of chosen m/z values was rated for its ability to distinguish cial intelligence program able to decipher diagnostic patterns the serum mass spectra acquired from the healthy and cancer- within the profiles. affected individuals. Since the aim was to identify the pattern The artificial intelligence program used in this study com- that provides the optimal segregation, the frequency values bined elements of a genetic algorithm with cluster analysis within the highest rated sets were reshuffled to form new subset [21–23]. The input files for the analysis were comprised of the candidates and the resultant defined amplitude values were rated m/z values on the x-axis along with their corresponding ampli- iteratively until the set that fully discriminates the preliminary tudes on the y-axis. The analysis was divided into two phases, a sample sets was revealed. pattern discovery phase and a pattern matching phase, as illus- Once the algorithm recognizes key m/z values, the model was trated in FIGURE 2. In the pattern discovery phase, a set of mass tested using a set of masked test spectra in which the optimal spectra of serum from both healthy and ovarian cancer-affected pattern recognized in the first phase is tested for its diagnostic individuals (referred to as the training set) was analyzed to iden- capabilities. As opposed to the pattern discovery phase which tify a subset of m/z values and their related amplitudes that are uses all of the m/z values within the entire spectral data set, in able to completely segregate the data acquired using serum the pattern matching phase, only the key subset of the m/z val- samples from patients with ovarian cancer and unaffected indi- ues identified in the pattern discovery phase was used to classify viduals. In the pattern discovery phase, the source of the serum the unknown samples as being from healthy or cancer-affected (from healthy or ovarian cancer-affected individuals) was individuals. The pattern formed by the relative amplitudes of known and is included as part of the data that is provided to the key m/z values in each unknown was then matched to the the algorithm. The bioinformatic searching process began with optimum pattern defined in the pattern-matching phase. Each hundreds of arbitrary choices of small sets (5–20) of the unknown sample was classified based upon the cluster(s) that exact m/z values selected along the x-axis of the mass spectra. its feature set populates as an unaffected or cancer-affected The diagnostic pattern was formed by plotting the combined individual, or generated a new cluster if it is found not to www.future-drugs.com 413

Conra d s, Zhou, Pe tric oin, Liotta & Ve e nstra Phase I: training set input Phase II: sample for diagnosis a. Unaffected samples b. Cancer samples Test/validation sample for diagnosis 1000 2000 3000 4000 5000 6000 1000 2000 3000 4000 5000 6000 m/z m/z Genetic algorithm + self-organizing cluster analysis Lead diagnostic fingerprint (from training set) Normal Cancer New 'Survival of the fittest' discriminatory patterns that discriminate unaffected from cancer-affected samples Figure 2. Bioinformatic analysis of proteomic pattern spectra for the determination of the discriminatory patterns in the training and diagnostic (testing and blind validation) phase. m/z: Mass to charge ratio. match any of the patterns defined in the pattern discovery accuracy combined with the attributes of the technology (e.g., phase. If the sample generates a new cluster then the point in ease of sample preparation and high-throughput) make pro- N-space of the unknown sample is outside the defined likeness teomic patterns a potentially invaluable screening tool for high- boundaries of the ovarian cancer and unaffected clusters. risk populations. Even with this high overall PPV, however, the After generating the diagnostic model, the diagnostic fea- technology in its present form is not useful as a clinical screen- ture sets defined in training were utilized in a series of test ing tool for a disease with a low prevalence such as ovarian can- samples in which the source of the serum was blinded. The cer. While a PPV of 94% as was reported in the ovarian cancer diagnostic feature set defined in training was able to correctly study by Petricoin and colleagues is extremely high, the specifi- diagnose the samples as being acquired from either control city (95%) of the assay when extrapolated over a large popula- patients or those suffering from ovarian cancer with a sensitiv- tion in which very few patients would actually have ovarian ity of 100% and a specificity of 95%, with an overall PPV of cancer would result in six out of every 100 patients being sent 94% [17]. The success in correctly diagnosing Stage I ovarian for unnecessary biopsies. This percentage of false-positives cancer suggested that proteomic patterns generated from would have a tremendous negative impact on the available biofluids may provide a useful indicator of the early onset of a medical resources. To serve as an effective screening tool, a particular disease state. diagnostic assay screening for ovarian cancer requires a specifi- city of at least 99.6% [29]. Therefore, while proteomic pattern analysis in its present state represents a useful tool to diagnose Im p rove m e nts in instrum e nta l a na lysis Since this original study, several laboratories have acquired and cancer, its use as a screening tool for high-risk populations is analyzed serum proteomic pattern analysis for the diagnosis of still limited. breast [24] and prostate [25–27] cancer. All of these studies used the One of the limiting factors in increasing the PPV attributes same analytical platform combined with different sample prepa- of proteomic pattern analysis is the PBS-II, which is a simple ration methods and bioinformatic algorithms. The analytical TOF-MS that is designed to provide for a broad m/z range of platform used is the ProteinChip® Biomarker System-II detection, however, necessarily at the expense of resolution. To (PBS-II; Ciphergen, CA, USA), a low-resolution TOF MS. assess the increase in diagnostic sensitivity and specificity that Reflective of the success of the ovarian cancer study [17], many of would be afforded from higher resolution mass spectra, our lab- these studies have been able to correctly diagnose serum samples oratory performed a side-by-side comparison of the results with sensitivities and specificities greater than 90%. The diagnostic obtained analyzing serum samples on a PBS-II and a hybrid 414 Expert Rev. Mol. Diagn. 3(4), (2003)

Ca nc e r d ia g nosis using p rote om ic p a tte rns quadrupole (Qq) TOF MS fitted with a SELDI ion source [30]. To compare the PPV of the results obtained from the two A schematic comparison of the two instruments is provided in MS platforms, 248 serum samples from either healthy women FIGURE 3. The QqTOF can be regarded as a triple Qq MS in or those clinically diagnosed with various stages of ovarian can- which the third quadrupole has been replaced with a reflecting cer were provided from the National Ovarian Cancer Early TOF analyzer. This instrument combines the benefits of ion Detection Program at Northwestern University Hospital (Chi- selectivity and tandem MS capabilities of a triple Qq MS with cago, Illinois, USA), and processed and analyzed by both the high mass accuracy and resolution afforded by a reflecting instruments. The key to this study is that the identical set of TOF analyzer. The PBS-II, on the other hand, is a relatively samples were analyzed on the exact same protein chip surface simple linear TOF MS. The mass analyzer, though relatively and all experimental variability outside the use of two different sensitive, provides only low resolution and hence lowmass accu- instruments was thereby eliminated. racy data. While time lag focusing is used to increase data reso- The mass spectra acquired on both the PBS-II and QqTOF lution and mass accuracy, the achievable mass accuracy is much MS instruments were analyzed using the ProteomeQuest™ less than that afforded using more conventional, high-resolu- (Correlogic Systems, Inc., MD, USA) bioinformatics tool as tion TOF MS instrumentation, such as the QqTOF. An exam- illustrated in FIGURE 2. A total of 28 serum samples from unaf- ple of a serum sample analyzed using both the PBS-II TOF MS fected women and 49 women with ovarian cancer were used for and the QqTOF MS is illustrated in FIGURE 4. While the spectra the training set. A series of diagnostic models were generated are qualitatively similar, the resolution obtainable with the using a variety of different combinations of bioinformatic heu- QqTOF MS is on the order of 60-fold higher than that ristic parameters. None of these parameters had any affect on obtainable with the PBS-II TOF MS. the raw MS data, they were simply related to the bioinformatic Vacuum pumps AAccelerator Ultrastable quadrupole mass filter Detector Sample ions High efficieny collision cell Q0 Q1 Q2 2012.5 Torr -2 10 Torr Effective flight path = 2.5 m -2 10 Torr Field free drift region -77x10 Torr Ion mirror BLaser Proteins ++++++Y+++---- Vacuum ---- +-Accelerating potential Figure 3. Schematic diagrams comparing the configuration of (A) QqTOF MS and (B) PBS-II TOF MS. The QqTOF combines elements of a triple quadrupole (Qq) MS with a reflecting TOF analyzer, thereby affording higher mass accuracy and resolution than the simple linear TOF of the PBS-II. ®PBS-II: ProteinChip Biomarker System-II (Ciphergen, CA, USA); TOF MS: Time-of-flight mass spectrometry. www.future-drugs.com 415

Conra d s, Zhou, Pe tric oin, Liotta & Ve e nstra A100 7771.4 75057923.0 3885.3 524072.8 8149.8 8945.7 4471.9 10276 02000 4000 6000 m/z 8000 10000 12000 B432100 00 00 00 07766.02 3883.57 3977.76 7955.38 4071.81 8142.98 333.59 8602.83 7193.78 4467.22 88933.46 2000 4000 6000 m/z Figure 4. Comparison of mass spectra acquired using a SELDI-TOF MS and a QqTOF MS equipped with a SELDI source. 8000 10000 12000 m/z: Mass to charge ratio; SELDI: Surface-enhanced laser desorption/ionization; TOF MS: Time-of-flight mass spectrometry; Qq: Quadrupole. process of generating diagnostic models from the raw data Four models were found that were both 100% sensitive and and included such factors as the similarity space of likeness specific in their ability to correctly discriminate serum samples for cluster classification, the feature set size of random m/z taken from unaffected women or those suffering from ovarian values whose combined intensities comprise each pattern and cancer. All of these models were generated from data acquired the learning rate of pattern generation by the genetic algo- using the QqTOF MS as no models generated from the PBS-II rithm. A total of 108 models were derived and queried with were both 100% sensitive and specific. Just as importantly, and the same set of proteomic pattern spectra in the testing and key if this technology is to become a viable screening tool, no blind validation phase to assess their sensitivity, specificity false-positive or false-negative classifications occurred using and overall PPV. these models, giving each a PPV of 100% using the patient The models derived from the training sets acquired on cohort employed in this study. both the PBS-II and QqTOF MS were tested using blind Another key aspect to this study is that the key m/z features serum sample mass spectra obtained from 31 unaffected that comprise the four diagnostic models that had 100% PPV women and 63 women with ovarian cancer, and further vali- for ovarian cancer revealed certain consistent features. Although dated using blind serum sample spectra obtained from 37 the proteomic patterns generated from both healthy and cancer- unaffected women and 40 women with ovarian cancer. The affected patients using the QqTOF MS are quite similar diagnostic models generated from mass spectra acquired (FIGURE 5), careful inspection of the raw mass spectra reveals that using the higher resolution QqTOF MS were statistically peaks at m/z values 7060.121 and 8605.678 are indeed differen- superior not only in testing (sensitivity, p < 0.0001; specifi- tially abundant in a selection of the serum samples obtained 2-19 -9 city, p < 3 x 10 ) but also in validation (sensitivity, p < 9 x 10 ; from ovarian cancer patients as compared with unaffected indi- specificity, p < 6 x 10 ) as evaluated using a two-tailed viduals (FIGURE 5, INSETS). The results indicated that these MS 22-62Cochrane–Armitage test for trend [31]. peaks originate from species that may be consistent indicators of 416 Expert Rev. Mol. Diagn. 3(4), (2003)

Ca nc e r d ia g nosis using p rote om ic p a tte rns the presence of ovarian cancer and represent good candidates for employed in this study, is preferred based upon the present ongoing efforts to identify low molecular weight components in results that serve as a platform for clinical trials of serum serum that may be key disease progression indicators. proteomic patterns. While a number of studies have reported impressive diag- nostic success using the lower resolution PBS-II TOF MS to Sum m a ry & c onc lusions screen for diseases of relatively low prevalence such as ovarian One of the overlooked powers of investigating proteomic pat- cancer, a minimum level of 99.6% sensitivity and specificity is terns is the ability to screen hundreds of serum samples in a required [29]. In blinded testing and validation studies, any high-throughput manner and therefore quickly determine tar- one of the four best models generated using QqTOF MS data gets (key m/z values) for further investigation. The inherent were able to correctly classify 22 of 22 women with Stage I variability of serum between individuals makes it impossible ovarian cancer, 81 of 81 women with Stage II, III and IV to compare and recognize valid disease indicators using the ovarian cancer and 68 of 68 benign disease controls. It can be conventional proteomic techniques of protein separation envisioned in the near future that a clinical test would simul- (2DE or multidimensional liquid chromatographic fractiona- taneously employ several combinations of highly accurate tion) and MS identification. The technology used to generate diagnostic proteomic patterns which, if taken together, could proteomic patterns is highly automated and even an academic achieve an even higher degree of accuracy in a screening set- laboratory can analyze in excess of 300 samples per day. This ting where a diagnostic test will face large population hetero- throughput allows for key discriminatory features to be dis- geneity and potential variability in sample quality and handling. tinguished within hundreds of serum or plasma samples over Hence, a high-resolution system, such as the QqTOF MS a statistically relevant population in a rapid fashion. It must 45Unaffected 7060 8606 100 0608500 900 7000 7100 7200 8600 m/z 8700 m/z 50 02000 4000 6000 8000 510000 12000 m/z Ovarian cancer 47060 8606 100 006900 7000 7100 7200 8500 8600 m/z 8700 m/z 50 02000 4000 6000 8000 10000 12000 m/z Figure 5. Comparison of SELDI QqTOF mass spectra of serum from an unaffected individual and an ovarian cancer patient. Insets show expanded m/z regions highlighting significant intensity differences of the peaks in the m/z 7060.121 and 8605.678 identified by the algorithm as belonging to the optimum discriminatory pattern. m/z: Mass to charge ratio; SELDI QuTOF: Surface-enhanced laser desorption/ionization hybrid quadrupole time-of-flight. www.future-drugs.com 417

Conra d s, Zhou, Pe tric oin, Liotta & Ve e nstra be reiterated that the ability to distinguish sera from an Exp e rt op inion unaffected individual or an individual with ovarian cancer Disease diagnostics using proteomic patterns has rapidly based upon a single serum proteomic m/z feature alone is not emerged as a potentially revolutionary tool to detect and monitor possible across the entire serum study set. Accurate histological disease progression or therapeutic response. Its emergence is distinction is only possible when the key m/z features and somewhat analogous to molecular fingerprinting in which the their intensities are considered en masse. DNA patterns obtained from different tumors have been dem- A limitation of individual cancer biomarkers is the lack of onstrated to be unique for each cancer [35]. In molecular finger- sensitivity and specificity when applied to large heterogeneous printing, the hope is that, in the future, physicians may be able populations [29,32]. Biomarker pattern analysis is an emerging to use this information to design treatment programs specific technology aimed at overcoming this limitation. While serum for each type of tumor. While the development of molecular proteomic pattern analysis has the potential to provide new fingerprinting has followed the progression of genomic analysis, tools for early diagnosis, therapeutic monitoring and outcome proteomic pattern analysis, however, represents a complete analysis, the success of this method will depend upon the abil- about-face in proteomic analysis. While the trend in proteomic ity of a selected set of features to transcend the biologic hetero- technology has been to identify and characterize an increasing geneity and methodological background noise. Using clinical number of proteins from a particular clinical sample in order to study sets, progress has been made toward this diagnostic goal find a disease-specific biomarker, proteomic patterns rely simply by employing a genetic algorithm coupled with a self-organiz- on a crude proteomic survey that provides all of the necessary ing cluster analysis to discover diagnostic subsets of m/z features diagnostic information. While the potential is great, much still and their relative intensities contained within high resolution needs to be learned. The concept of using a proteomic pattern mass spectral data. One of the consistencies within many of the as a diagnostic tool is in its infancy, therefore every step in this diagnostic proteomic patterns is that a majority of the key m/z analytical process requires optimization. This optimization values are of low molecular weight, typically less than 10 kDa. process will include such aspects as sample acquisition and The low molecular weight serum proteome is an unexplored processing, in addition to pattern acquisition and data analysis. archive, even though this is the mass region where MS is best Since the diagnostic power of proteomic patterns relies heavily suited for analysis. It is likely that disease-associated species are upon the use of bioinformatics, it is important to discover the comprised of low molecular weight peptide/protein species that biological basis behind the mathematical solution. While the vary in mass by as little as a few Daltons. Thus, bioinformatic identification of key peaks that are distinguished by the bioin- analysis of higher resolution MS data would be expected to formatic analysis may not provide any clues as to the manifestation discover patterns not discernable within lower resolution MS data. or progression of the disease, the hope is they can at least validate One major criticism of the use of proteomic patterns for diag- the results being provided. While many critics still abound, one nostic purposes is that the identity of the proteins or peptides simple fact cannot be ignored: the diagnostic models generated giving rise to the key m/z features is not known [33,34]. At this from proteomic patterns continue to provide highly sensitive point, it is debatable as to whether it is worth the effort to iden- and specific results in testing and blind validation studies, even as tify these features as they may provide little aid in developing an the number of samples being analyzed continues to increase. alternative diagnostic platform. For example, many of the key features within the proteomic patterns that account for the diag- Five -ye a r vie w nostic predictability are of low m/z (<10,000 Da) and therefore The next 5 years will be critical in the validation of the use of it is likely that these could be from fragment species generated proteomic patterns in disease detection. Currently, the infor- from larger proteins that are proteolyzed either within the circu- mation present in proteomic patterns may provide an extremely latory system or in the tumor/host microenvironment. It would powerful complementary tool to assist physicians in disease be extremely challenging to generate an affinity reagent with diagnosis. The impact of proteomic patterns in disease diagno- specificity to a peptide fragment without considerable cross- sis, however, has the potential to be even greater than a comple- reactivity to its parent protein. In addition, there are many tools mentary tool. While at this point it is not clear as to whether in medicine today, such as the electrocardiogram, with which proteomic patterns reveal interindividual differences within the the physician relies solely on a pattern to base his/her diagnosis. same type of cancer, there is an interest in using proteomic pat- Even the identification of a specific biomarker may not provide terns to recognize the best treatment for each afflicted individ- any direct insight into how a disease may arise or progress. For ual. While not fully developed, there is an active interest in instance, while PSA is used to indicate the possible presence of a determining if proteomic patterns can be used to predict a prostatic tumor, its role in cancer development remains unclear. patient ’s response to a specific therapy. By combining informa- Conversely, there is also the likelihood that these key features tion from proteomic patterns with that obtained from molecu- may represent proteins that provide exciting insights into the lar fingerprints or a histopathological assessment, the optimal manifestation and progression of cancer. Therefore, identifying treatment for the individual may be more obvious than a sim- these features is most likely a worthwhile effort although the ple trial and error regimen. The NCI has invested in a program advancement of disease diagnostics using proteomic patterns to garner FDA approval for the use of proteomic patterns in the should not be hindered by this exercise. diagnosis of ovarian cancer in high-risk populations. In addition, 418 Expert Rev. Mol. Diagn. 3(4), (2003)

Ca nc e r d ia g nosis using p rote om ic p a tte rns the two largest clinical diagnostic laboratories in the USA, Quest Diagnostics (NJ, USA) and Laboratory Corporation of America (NC, USA), have signed licensing agreements to develop and market the ovarian cancer protein pattern blood test. As with any emerging technology, the niche that proteomics will fill within the field of diagnostic medicine remains to be seen. The most obvious benefit of defined proteomic pattern diagnos- tic features can provide is in population screening to detect dis- eases such as cancer at earlier stages to enable more effective med- ical intervention. The simplicity of the test makes it feasible to screen high-risk populations for a variety of different cancers. The utility of proteomic patterns will be highly dependent upon the level of their inherent sensitivity and specificity. If the sensi- tivity and specificity can approach 100%, disease diagnosis using proteomic patterns will revolutionize diagnostic medicine as it can be used reliably for the early detection of low prevalence can- cers. The detection of cancers at the earliest possible stage will save countless lives and help to meet the goal of the NCI to alle- viate the pain and suffering of cancer by the year 2015. Even if this level of sensitivity and specificity is not achieved, proteomic patterns will still provide an invaluable complement to determine the need for a patient biopsy or response to therapy. Key issues ••There is an urgent need for cancer biomarkers with more accurate diagnostic capability, particularly for early-stage disease. Conventional proteomic technologies focus upon identifying disease-specific biomarkers, while proteomic pattern analysis uses the overall pattern to diagnose disease states without the need to identify any of the components within the pattern. • Disease diagnostics using proteomic patterns is a revolutionary method to detect early-stage cancer. •••Raw biofluids, such as serum and plasma, can be used to acquire proteomic patterns with a simple time-of-flight mass spectrometer. Bioinformatic software is required to decipher the patterns within the mass spectra that discriminate serum acquired from healthy and cancer-affected individuals. Information contained within proteomic patterns has been demonstrated to detect ovarian, breast and prostate cancers with sensitivities and specificities greater than 90%. Ac knowle d g m e nts The authors would like to acknowledge Correlogics Systems for their collaboration on using high resolution mass spectrometry to diagnose ovarian cancer. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Contract No. NO1-CO-12400. ••The use of a higher resolution mass spectrometer has demonstrated the potential to provide high enough sensitivity and specificity to enable the use of proteomic patterns as a screening tool for low prevalence cancers. Until further blinded validation studies are performed to verify the apparent extraordinary sensitivity and specificity of this approach, the method should be considered investigational and not yet ready for clinical use. Disc la im e r The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government. Re fe re nc e s 4Patterson SD, Aebersold RH. Proteomics: the first decade and beyond. Nature Genet. 10 Srivastava S, Gopal-Srivastava R. Biomarkers in cancer screening: a public health perspective. J. Nutr. 132(8 Suppl.), Papers of special note have been highlighted as: 33(Suppl.), 311–323 (2003). ••of interest • of considerable interest 2471S–2475S (2002). 5Schieltz DM, Yates JR 3rd. Direct identification of proteins in ultracomplex mixtures. Applications to proteome analysis. Methods Mol. Biol. 211, 235–245, (2003). 11 Guppy AE, Rustin GJ. CA125 response: can it replace the traditional response 1Wilkins MR, Sanchez JC, Gooley AA et al. Progress with proteome projects: why all proteins expressed by a genome should be identified and how to do it. Biotechnol. Genet. Eng. Rev. 13,19–50 (1996). criteria in ovarian cancer? Oncologist 7(5), 437–443 (2002). 6Resing KA. Analysis of signaling pathways using functional proteomics. Ann. NY Acad. 12 Grossklaus DJ, Smith JA, Shappell SB, Sci. 971, 608–614, (2002). Coffey CS, Chang SS, Cookson MS. The free/total prostate-specific antigen ratio is the best predictor of tumor involvement in the radical prostatectomy specimen among men with an elevated PSA. Urol. Oncol. 2Phizicky E, Bastiaens PI, Zhu H, Snyder M, Fields S. Protein analysis on a proteomic scale. Nature 422(6928), 208–215 78Hanash S. Disease proteomics. Nature 422(6928), 226–232 (2003). (2003). Temple RJ. A regulatory authority ’s opinion about surrogate end points. In: Clinical Measurement in Drug Evaluation. Nimmo WS, Tucker GT (Eds). John Wiley and Sons, Inc., New York, NY, USA (1995). 3Aebersold R, Mann M. Mass spectrometry- 7(5), 195–198 (2002). based proteomics. Nature 422(6928), 13Adkins JN, Varnum SM, Auberry KJ et al. Toward a human blood serum proteome: analysis by multidimensional separation coupled with mass spectrometry. Mol. Cell. Proteomics 1(12), 947–955 198–207 (2003). •Describes the current state-of-the-art technology in the characterization of proteomes using methods to identify thousands of proteins in clinical samples. 9Rippey JH. Pregnancy tests: evaluation of current status. Crit. Rev. Clin. Lab. Sci. 19(4), 353–359 (1984). (2002). www.future-drugs.com 419

Conra d s, Zhou, Pe tric oin, Liotta & Ve e nstra 14Pieper R, Su Q, Gatlin CL, Huang ST, Anderson NL, Steiner S. Multi-component immunoaffinity subtraction chromatography: an innovative step towards a comprehensive survey of the human plasma proteome. Proteomics 3(4), 24 Kohonen T. The self-organizing map. Proc. Inst. Electrical Electronics Eng. 78, 1464–1480 (1990). Affilia tions ThomasP Conrads, PhD •••Associate Director Mass Spectrometry Center, Biomedical Proteomics Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, PO Box B, Frederick, MD 21702, USA Tel.: +1 301 846 7353 Fax: +1 301 846 6037 conrads@ncifcrf.gov 25Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin. Chem. 48(8), 1296–1304 (2002). 422–432 (2003). 15 Hutchens TW, Yip TT. New desorption strategies for the mass spectrometric analysis of macromolecules. Rapid. Commun. Mass Spectrom. 7, 576–580 (1993). 26Qu Y, Adam BL, Yasui Y et al. Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clin. Chem. 48(10), 1835–1843 (2002). 16 Issaq HJ, Conrads TP, Prieto DA, Tirumalai R, Veenstra TD. SELDI-TOF MS for diagnostic proteomics. Anal. Chem. MingZhou, PhD Senior Scientist Mass Spectrometry Center, Biomedical Proteomics Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, PO Box B, Frederick, MD 21702, USA Tel.: +1 301 846 7199 Fax: +1 301 846 6037 mzhou@ncifcrf.gov 75(7), 148A–155A (2003). 27Adam BL, Qu Y, Davis JW et al. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res. •Describes details of the procedure behind the acquisition of proteomic patterns ranging from the preparation of protein chips to their analysis by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS). 62(13), 3609–3614 (2002). •22•89Describes the use of proteomic patterns to provide a highly accurate method for the early detection and diagnosis of prostate cancer. 17 Petricoin EF, Ardekani AM, Hitt BA et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306), Emmanuel F Petricoin III Co-Director NCI Clinical Proteomics Program, Food and Drug Administration-National Cancer Institute Clinical Proteomics Program, Center for Biologics Evaluation and Research, Food and Drug Administration, Bethesda, MD 20892, USA Tel.: +1 301 827 1753 572–577 (2002). Petricoin EF 3rd, Ornstein DK, Paweletz CP et al. Serum proteomic patterns for detection of prostate cancer. J. Natl Cancer Inst. 94(20), 1576–1578 (2002). ••Describes the use of proteomic patterns for the diagnosis of ovarian cancer and provides an in-depth description of the bioinformatic tools used to classify benign from malignant conditions. Kainz C. Early detection and preoperative diagnosis of ovarian carcinoma. Wien. Med. Wochenschr. 146(1–2), 2–7 (1996). 189Jacobs IJ, Skates SJ, MacDonald N et al. Screening for ovarian cancer: a pilot randomised controlled trial. Lancet 353, Fax: +1 301 480 3256 petricoin@cber.FDA.gov ••Describes the need for high specificity in the screening of women for ovarian cancer. 1207–1210 (1999). •Lance Liotta 30Chernushevich IV, Loboda AV, Thomson BA. An introduction to quadrupole-time- of-flight mass spectrometry. J. Mass Spectrom. 36(8), 849–865 (2001). Co-Director 1Cohen LS, Escobar PF, Scharm C, Glimco B, Fishman DA. Three- dimensional power Doppler ultrasound improves the diagnostic accuracy for ovarian cancer prediction. Gynecol. Oncol. NCI Clinical Proteomics Program, Laboratory of Pathology, National Cancer Institute, Center for Cancer Research, Bethesda, MD 20892, USA Tel.: +1 301 496 2035 Fax: +1 301 480 0853 liottal@nih.gov 3312Agresti A. Categorical Data Analysis. John Wiley and Sons, New York, NY, USA 82, 40–48 (2001). (1990). 20 Ozols RF, Rubin SC, Thomas GM, Robboy SJ. Epithelial ovarian cancer. In: Principles and Practice of Gynecologic Oncology. Hoskins WJ, Perez CA, Young RC (Eds),.Lippincott Williams and Wilkins, Philadelphia, PA, USA, Leung GM, Lam T-H, Thach TQ, Hedley AJ. Will screening mammography in the east do more harm than good? Am. J. Public Health 92(11), 1841–1846 (2002). •Timothy D Veenstra, PhD Director Analytical Chemistry Laboratory, Biomedical Proteomics Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, PO Box B, Frederick, 33Diamandis EP. Proteomic patterns in serum and identification of ovarian cancer. Lancet 981–1058 (2000). 360(9327), 170 (2002). 212Menon U, Jacobs I. Recent developments in ovarian cancer screening. Curr. Opin. Obstet. Gynaecol. 12, 39–42 (2000). •Points out some of the criticisms of using proteomic patterns as a diagnosis for ovarian cancer. MD 21702, USA Tel.: +1 301 846 7286 Fax: +1 301 846 6037 veenstra@ncifcrf.gov 2Holland JH. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence (3rd edition). MIT Press, Cambridge, MA, USA (1994). 3345Diamandis EP. Serum proteomic patterns for detection of prostate cancer. J. Natl Cancer Inst. 95(6), 489–90 (2003). Perou CM, Sorlie T, Eisen MB et al. Molecular portraits of human breast tumours. Nature 406(6797), 747–752 23 Kohonen Y. Self-organizing formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982). (2000). 420 Expert Rev. Mol. Diagn. 3(4), (2003)