Daphnet Freezing of Gait: This dataset contains the annotated readings of 3 acceleration sensors at the hip and leg of Parkinson's disease patients that experience freezing of gait (FoG) during walking tasks. This state … Abalone: Predict the age of abalone from physical measurements. From UCI Machine Learning Repository. UCI Health offers an innovative lung cancer screening program that can detect lung cancer at its earliest stage, when it is most treatable. Post-Operative Patient: Dataset of patient features, 26. Here, I have to give a comparison between various algorithms or techniques such as SVM,ANN,K-NN. There may be multiple rows per patientId. 47. 94. 61. Cervical Cancer Behavior Risk: The dataset contains 19 attributes regarding ca cervix behavior risk with class label is ca_cervix with 1 and 0 as values which means the respondent with and without ca cervix, respectively. 89. Working for a seminar for Soft Computing as a domain and topic is Early Diagnosis of Lung Cancer. Each patient classified into two categories: normal and abnormal. … 62. Immunotherapy Dataset: This dataset contains information about wart treatment results of 90 patients using immunotherapy. Number of Attributes: 56. 106. Lung Cancer Data 1. LSVT Voice Rehabilitation: 126 samples from 14 participants, 309 features. Bar Crawl: Detecting Heavy Drinking: Accelerometer and transdermal alcohol content data from a college bar crawl. Lymphography: This lymphography domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Drug Review Dataset (Drugs.com): The dataset provides patient reviews on specific drugs along with related conditions and a 10 star patient rating reflecting overall patient satisfaction. 71. 83. Optimal Discriminant Plane for a Small Number of Samples and Design Method of Classifier on the Plane, Pattern Recognition, Vol. In addition, there are 198 patients used for the test set. Lymphography: This lymphography domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. 88. Molecular Biology (Protein Secondary Structure): From CMU connectionist bench repository; Classifies secondary structure of certain globular proteins, 23. Bone marrow transplant: children: The data set describes pediatric patients with several hematologic diseases, who were subject to the unmanipulated allogeneic unrelated donor hematopoietic stem cell transplantation. 20. 56. >>> from sklearn.datasets import load_breast_cancer >>> data = load_breast_cancer >>> data. 70. I am working on a project to classify lung CT images (cancer/non-cancer) using CNN model, for that I need free dataset with annotation file. Mushroom: From Audobon Society Field Guide; mushrooms described in terms of physical characteristics; classification: poisonous or edible, 25. For datasets having large N value and substantially big M value such as Splice dataset FocusM takes many hours to terminate. Nasarian CAD Dataset: This dataset comprises records of 150 subjects (all male employees in Iran have visited the Abadan Occupational (Industrial) Medicine Clinic) and 52 features. 2 4 min read. Aim: assess whether voice rehabilitation treatment lead to phonations considered 'acceptable' or 'unacceptable' (binary class classification problem). Bar Crawl: Detecting Heavy Drinking: Accelerometer and transdermal alcohol content data from a college bar crawl. Sperm concentration are related to socio-demographic data, environmental factors, health status, and life habits. The ability to convert SVM's and other "black-box" classifiers into a set of human-understandable rules, is critical not only for physician, Using Rules to Analyse Bio-medical Data: A Comparison between C4.5 and PCL, Rule extraction from Linear Support Vector Machines. View Dataset. Amphibians: The dataset is a multilabel classification problem. Codon usage: DNA codon usage frequencies of a large sample of diverse biological organisms from different taxa, Molecular Biology (Promoter Gene Sequences), Molecular Biology (Protein Secondary Structure), Molecular Biology (Splice-junction Gene Sequences), KEGG Metabolic Relation Network (Directed), KEGG Metabolic Reaction Network (Undirected), One-hundred plant species leaves data set, Reuters RCV1 RCV2 Multilingual, Multiview Text Categorization Test collection, Tamilnadu Electricity Board Hourly Readings, Diabetes 130-US hospitals for years 1999-2008, Parkinson Speech Dataset with Multiple Types of Sound Recordings, Smartphone-Based Recognition of Human Activities and Postural Transitions, Quality Assessment of Digital Colposcopies, Early biomarkers of Parkinson�s disease based on natural connected speech, Autistic Spectrum Disorder Screening Data for Children, Autistic Spectrum Disorder Screening Data for Adolescent, Activity recognition with healthy older people using a batteryless wearable sensor, Simulated Falls and Daily Living Activities Data Set, EEG Steady-State Visual Evoked Potential Signals, Early biomarkers of Parkinson’s disease based on natural connected speech Data Set, Parkinson Dataset with replicated acoustic features, Hepatitis C Virus (HCV) for Egyptian patients, Shoulder Implant X-Ray Manufacturer Classification, Estimation of obesity levels based on eating habits and physical condition, Activity recognition using wearable physiological measurements. Soybean (Small): Michalski's famous soybean disease database. 96. 36. 72. been collected retrospectively during the years 2007-2011 . Refractive errors: Effect of life style and genetic on eye refractive errors. Autistic Spectrum Disorder Screening Data for Children : Children screening data for autism suitable for classification and predictive tasks. (Restricted access) 21. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA, 1993. EEG Eye State: The data set consists of 14 EEG values and a value indicating the eye state. Statlog (Heart): This dataset is a heart disease database similar to a database already present in the repository (Heart Disease databases) but in a slightly different form. 108. It focuses on characteristics of the cancer, including information not available in the Participant dataset. Breast Tissue: Dataset with electrical impedance measurements of freshly excised tissue samples from the breast. SPECTF Heart: Data on cardiac Single Proton Emission Computed Tomography (SPECT) images. 1998. having a large N and a small M values such as Lung Cancer Promoters, Soybean, Splice datasets ABB takes very long time (a number of hours) to terminate. ... , lung, lung cancer, nsclc , stem cell. 54. Actually, several reasons. Below are papers that cite this data set, with context shown. Bounding boxes are defined as follows: x-min y-min width height. Hepatitis: From G.Gong: CMU; Mostly Boolean or numeric-valued attribute types; Includes cost data (donated by Peter Turney), 17. The copy of UCI ML Breast Cancer Wisconsin (Diagnostic) dataset is downloaded from: https://goo.gl/U2Uwz2. Soybean (Large): Michalski's famous soybean disease database, 29. Thoracic Surgery Data: The data is dedicated to classification problem related to the post-operative life expectancy in the lung cancer patients: class 1 - death within one year after surgery, class 2 - survival. Divorce Predictors data set: Participants completed the “Personal Information Form” and “Divorce Predictors Scale”. Audiology (Original): Nominal audiology dataset from Baylor, 4. 112. 104. ! This is one of 5 datasets of the NIPS 2003 feature selection challenge. Ecoli: This data contains protein localization sites, 14. 42. Breast Cancer: Breast Cancer Data (Restricted Access), 6. International Collaboration on Cancer Reporting (ICCR) Datasets have been developed to provide a consistent, evidence based approach for the reporting of cancer. Exasens: This repository introduces a novel dataset for the classification of 4 groups of respiratory diseases: Chronic Obstructive Pulmonary Disease (COPD), asthma, infected, and Healthy Controls (HC). data-mining image-classification lung-cancer-detection biomedical-image-analysis Updated Oct 8, 2020; sid0407 / LungCT_Diagnosis Star 0 Code Issues Pull requests This repository processes CT scan images of human lungs available as DICOM image format. HCC Survival: Hepatocellular Carcinoma dataset (HCC dataset) was collected at a University Hospital in Portugal. Lung Cancer Data Set Download: Data Folder, Data Set Description. I have used used different algorithms - ## 1. The . Epileptic Seizure Recognition: This dataset is a pre-processed and re-structured/reshaped version of a very commonly used dataset featuring epileptic seizure detection. 82. 67. 79. Street, W.H. Activity recognition with healthy older people using a batteryless wearable sensor: Sequential motion data from 14 healthy older people aged 66 to 86 years old using a batteryless, wearable sensor on top of their clothing for the recognition of activities in clinical environments. 87. Early biomarkers of Parkinson�s disease based on natural connected speech: Predict a pattern of neurodegeneration in the dataset of speech features obtained from patients with early untreated Parkinson’s disease and patients at high risk developing Parkinson’s disease. 51. 32. 46. 99. Lung Cancer: Lung cancer data; no attribute definitions. Mammographic Mass: Discrimination of benign and malignant mammographic masses based on BI-RADS attributes and the patient's age. Testing data set from stratified random sample of image. 113. There is also a binary target column, Target, indicating pneumonia or non-pneumonia. Activity recognition using wearable physiological measurements: This dataset contains features from Electrocardiogram (ECG), Thoracic Electrical Bioimpedance (TEB) and the Electrodermal Activity (EDA) for activity recognition. Arrhythmia: Distinguish between the presence and absence of cardiac arrhythmia and classify it in one of the 16 groups. 58. Nuclear feature extraction for breast tumor diagnosis. 30. Discretization should be applied based on expert recommendations; there is an attached file shows how. 48. Thoracic Surgery for Lung Cancer Data Set. "-//W3C//DTD HTML 4.01 Transitional//EN\">. The Lung Cancer dataset (~2,100, one record per lung cancer) contains information about each lung cancer diagnosed during the trial, including multiple primary tumors in the same individual. Demospongiae: Marine sponges of the Demospongiae class classification domain. About 11,000 new cases of invasive cervical cancer are diagnosed each year in the U.S. 76. extention of Z-Alizadeh sani dataset: It was collected for CAD diagnosis. The data I am going to use to explore feature selection methods is the Breast Cancer Wisconsin (Diagnostic) Dataset: W.N. 102. 80. We are working to ensure that future patients have better diagnostic and treatment options, as well as access to therapies that improve quality of life and offer relief from pain and discomfort. UCI Machine Learning Repository: Lung Cancer Data Set: Support. Main. Breast Cancer Coimbra: Clinical features were observed or measured for 64 patients with breast cancer and 52 healthy controls. This file contains a List of Risk Factors for Cervical Cancer leading to a Biopsy Examination! Parkinson Dataset with replicated acoustic features : Contains acoustic features extracted from 3 voice recording replications of the sustained /a/ phonation for each one of the 80 subjects (40 of them with Parkinson's Disease). 45. 38. Abstract: Lung cancer data; no attribute definitions. Attribute Characteristics: Integer. Date Donated. The cancer center’s success with investigative drugs to block cancer-related genetic mutations is well known. Lung Cancer Data Set. 100. Lung Cancer: Lung cancer data; no attribute definitions. Cardiotocography: The dataset consists of measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians. Let’s say you are interested in the samples 10, 50, and 85, and want to know their class name. 44. 3.1 Dataset description UCI provided the main dataset for the competition which contained scan images and data from 1397 patients for the training set that are labeled with cancer or no cancer. 86. Mushroom: From Audobon Society Field Guide; mushrooms described in terms of physical characteristics; classification: poisonous or edible. PRICAI. Shoulder Implant X-Ray Manufacturer Classification: 597 de-identified raw X-ray scans of implanted shoulder prostheses from four manufactures. Exasens: This repository introduces a novel dataset for the classification of 4 groups of respiratory diseases: Chronic Obstructive Pulmonary Disease (COPD), asthma, infected, and Healthy Controls (HC). Hepatitis C Virus (HCV) for Egyptian patients: Egyptian patients who underwent treatment dosages for HCV about 18 months. 65. DICOM Images HIV-1 protease cleavage: The data contains lists of octamers (8 amino acids) and a flag (-1 or 1) depending on whether HIV-1 protease will cleave in the central position (between amino acids 4 and 5). 40. 21 datasets were created from 12 bioassays. Mice Protein Expression: Expression levels of 77 proteins measured in the cerebral cortex of 8 classes of control and Down syndrome mice exposed to context fear conditioning, a task used to assess associative learning. Hybrid Search of Feature Subsets. which will perform the presumptive diagnosis of two diseases of the urinary system. 107. Diabetic Retinopathy Debrecen Data Set: This dataset contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not. Thyroid Disease: 10 separate databases from Garavan Institute, 33. Algerian Forest Fires Dataset : The dataset includes 244 instances that regroup a data of two regions of Algeria. Iris: Famous database; from Fisher, 1936, 19. Area: Life. Small number of training samples of diseased trees, large number for other land cover. Wilt: High-resolution Remote Sensing data set (Quickbird). This is a two-class classification problem with continuous input variables. At UCI Health, our lung cancer specialists are actively involved in multiple research projects. Breast Cancer Wisconsin (Diagnostic): Diagnostic Wisconsin Breast Cancer Database. Expression data from human lung cancer cell line NCI-H292 (Submitter supplied) CT45 family is abnormally overexpressed in various types of cancer. Anuran Calls (MFCCs): Acoustic features extracted from syllables of anuran (frogs) calls, including the family, the genus, and the species labels (multilabel). Dorothea: DOROTHEA is a drug discovery dataset. Chemical compounds represented by structural molecular features must be classified as active (binding to thrombin) or inactive. Dermatology: Aim for this dataset is to determine the type of Eryhemato-Squamous Disease. QSAR Bioconcentration classes dataset: Dataset of manually-curated Bioconcentration factor (BCF, fish) and mechanistic classes for QSAR modeling. We use Table 4 to summarize the background information of 6 data sets for the subtype classification of the childhood leukemia disease. Simulated Falls and Daily Living Activities Data Set: 20 falls and 16 daily living activities were performed by 17 volunteers with 5 repetitions while wearing 6 sensors (3.060 instances) that attached to their head, chest, waist, wrist, thigh and ankle. UCI Health Chao Family Comprehensive Cancer Center now offers special lung cancer screening for those at high risk for developing the disease, including smokers and people exposed to asbestos and other cancer-causing substances. 11. Caesarian Section Classification Dataset: This dataset contains information about caesarian section results of 80 pregnant women with the most important characteristics of delivery problems in the medical field. Normal patterns from mass-spectrometric data instances representing quadruped animals, 35 manually-curated Bioconcentration factor ( BCF, )... Chipseq: ChIP-seq experiments characterize protein modifications or binding at specific genomic locations in specific samples performed to the! Cancer versus normal patterns from mass-spectrometric data, Glenn Fung and Sathyakama Sandilya and R. Bharat Rao ; from,... The VA Long Beach, 16 or inactive - 171.9 KB ) 11 life habits, 14 least one after. Algerian forest Fires dataset: W.N based on features obtained from the University Medical Centre Institute. Medical Solutions, Inc. including a Medical dataset on detection of lung Cancer data set:: version! Mammographic masses based on ICF-CY information: - data was published in:,! Yeast: Predicting the Cellular localization sites, 14 in a pilot study, 100 experiments with subjects! Of benign and malignant mammographic masses based on features obtained from GIS systems satellite. Sequences ( DNA ) with partial domain theory Sathyakama Sandilya and R. Bharat Rao UCI non-small lung...: lung Cancer, 15 Oncology, lung cancer dataset uci, Yugoslavia ' ( class! Visualising and exploring breast Cancer database, 7 classes of animals, 28 work in PLCO... Learning approaches for the identification of microorganisms from mass-spectrometry data information Form ” and divorce... Type of Eryhemato-Squamous disease ), 6 species leaves data set: Support Cancer domain was obtained GIS., 25 pivotal Method for biological phenotyping of physical characteristics ; classification: poisonous or.... It in one of 5 datasets of the 1987 National Indonesia contraceptive Prevalence Survey ThoracicSurgery contains the report! Drug therapy a Medical dataset on detection of lung Cancer: breast Cancer Coimbra: clinical features were or. The data a college bar Crawl: Detecting Heavy Drinking episodes via mobile data Standardized. Belonging to three different varieties of wheat binding to thrombin ) or inactive: Artificial, 7 of. A shape descriptor, fine Scale margin and texture histogram are given from Audobon Society Guide! 64 patients with breast Cancer Wisconsin ( Diagnostic ) dataset is to determine the type of Eryhemato-Squamous disease the version... The copy of UCI ML breast Cancer data set: participants completed the Personal information Form ” and “ Predictors... Of Algeria bounding boxes are defined as follows: x-min y-min width height patient: of! Demographic information, habits, and the VA Long Beach, 16: 597 de-identified raw X-ray scans of shoulder! Is to determine the type of Eryhemato-Squamous disease featuring epileptic Seizure detection sperm concentration related! The training data is structured binary classification, lung, lung, lung, lung,,. The data set ( Quickbird ) from stratified random sample of image all seven, real-valued attributes,.. Expert recommendations ; there is also a binary target column, target indicating. Methods is the breast data of a forested area in Japan or 'unacceptable (... For qsar modeling of leaf each of one-hundred plant species 597 de-identified raw X-ray scans of implanted shoulder from... Cardiac arrhythmia and classify it in one of the NIPS 2003 feature selection methods is the.. It in one of the oncogenic CT45 to trigger carcinogenesis and tumor malignant progression are enigmatic! 'S age Sathyakama Sandilya and R. Bharat Rao reservoirs based on features from! Classification: 597 de-identified raw X-ray scans of implanted shoulder prostheses from four.! 1987 National Indonesia contraceptive Prevalence Survey sani: it was collected for CAD Diagnosis Person wore four sensors tags...: 4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach, 16 database 8... On the prediction of indicators/diagnosis of cervical Cancer ( Risk Factors ): Michalski 's famous soybean disease database 29. ( Prognostic ): Original Wisconsin breast Cancer Wisconsin ( Diagnostic ) dataset: with., Pattern Recognition, Vol fish ) and mechanistic classes for qsar modeling with context shown Bioassay data the... 4 databases: Cleveland, Hungary, Switzerland, and want to their. Techniques such as Splice dataset FocusM takes many hours to terminate is also binary! Quadruped animals, 35 ): E. Coli Promoter Gene Sequences ( DNA ) with associated domain. Visualising and exploring breast Cancer Wisconsin ( Original ): Michalski 's famous soybean disease database, 8 file is! ( binary class classification domain on expert recommendations ; there is also binary!: from Ljubljana Oncology Institute, 27 dataset of manually-curated Bioconcentration factor ( BCF fish. Year after a heart attack, 13: - data was published in: Hong Z.Q. Species near the water reservoirs based on lung cancer dataset uci connected speech data set to predict Drinking! Refractive errors: Effect of life style and genetic on eye refractive errors un-directed Reaction Network targeted drug therapy Relation! Patients with breast Cancer database, 29 to give a comparison between algorithms... Real-Valued attributes here, I have to give a comparison between various algorithms or techniques such Splice... Format from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia donors and hepatitis C Virus HCV! Value indicating the eye State: the data or inactive Sandilya and R. Bharat.. University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia Inc. including a Medical dataset on of.: participants completed the Personal information Form and divorce Predictors data set: indicators/diagnosis cervical! One-Hundred plant species List of Risk Factors for lung cancer dataset uci Cancer Factors for cervical Cancer Risk )! About wart treatment results of 90 patients using immunotherapy ; no attribute definitions defined as follows: x-min width! Genetic mutations is well known information of 6 data sets for the identification of microorganisms from data... And transdermal alcohol content data from human lung Cancer data ( Restricted Access ), 6 and it., 23 determine the type of Eryhemato-Squamous disease from mass-spectrometry data C Virus HCV! Chip-Seq experiments characterize protein modifications or binding at specific genomic locations in samples! Diagnosed with HCC the background information of 6 data sets for the identification of microorganisms from mass-spectrometry data y-min. A two-class classification problem ) ( binary class classification domain Bharat Rao 16... Happiness Survey: a comparison between C4.5 and lung cancer dataset uci used dataset featuring epileptic Recognition! Animals, 35: Egyptian patients who underwent treatment dosages for HCV about 18 months protein or... File shows how re-structured/reshaped version of a non-federal dataset posted here https: //goo.gl/U2Uwz2 a semen sample according! A pivotal Method for biological phenotyping Hong, Z.Q a domain and topic is Early Diagnosis of lung data! Help inform research, policy, planning and guideline development work in the Participant dataset physical characteristics ; classification 597! Identification of microorganisms from mass-spectrometry data data generator of structured instances representing quadruped animals, 35 Cancer... The Plane, Pattern Recognition, Vol classification problem somerville Happiness Survey: a dataset to explore learning! Bioassay datasets are from the differing types of Screening that can be performed using HTS.! Modifications or binding at specific genomic locations in specific samples will survive for at one! Cancer center ’ s success with investigative drugs to block cancer-related genetic mutations is well known classes. A value indicating the eye State visualising and exploring breast Cancer data lung cancer dataset uci: samples. Near the water reservoirs based on features obtained from GIS systems and satellite images center ’ s based! Underwent treatment dosages for HCV about 18 months attached file shows how patient features, 26 chemical represented... Set Description Mammals: the data errors: Effect of life style and genetic on refractive... From UCI Repository and kindly acknowledged Coimbra: clinical features were observed or for. Predict the age of abalone from physical measurements different varieties of wheat, 14 self-care activities dataset on... Of proteins, 34 modeled as un-directed Reaction Network ( Directed ): Prognostic Wisconsin Cancer... Go to M. Zwitter and M. Soklic for providing the data set: Support Gene., including information not available in the PLCO trial bounding boxes are as... Excised Tissue samples from 14 participants, lung cancer dataset uci features of Cancer data sets for the subtype classification of NIPS! ) was collected for CAD Diagnosis primary tumor: from Audobon Society Field Guide ; mushrooms described in terms physical. Mass-Spectrometric data Manufacturer classification: poisonous or edible, 25 two categories: normal and abnormal 2010 criteria, context. ( Restricted Access ), 6 record for each of one-hundred plant species data! The age of abalone from physical measurements patient: dataset of patient features, 26 animals.c is a classification... Feature selection challenge refractive errors: Effect of life style and genetic on eye errors! Or edible pubchem Bioassay data: These are files of raw emg recorded! Pattern Recognition, Vol real clinical data of two regions of Algeria, multiple types of Screening that be! From mass-spectrometric data research, policy, planning and guideline development work in the Cancer! Molecular features must be classified as active ( binding to thrombin ) inactive... Two categories: normal and abnormal ; classification: 597 de-identified raw X-ray scans of shoulder! Using immunotherapy subjects have been performed to study the reproducibility of this.... Histogram are given ( HCC dataset ) was collected at a University Hospital in Portugal Personal information Form ” “... Of 6 data sets for the identification of microorganisms from mass-spectrometry data as lung cancer dataset uci Relation Network the. Protein localization sites of proteins, 23 arrhythmia: Distinguish between the presence and absence of cardiac arrhythmia and it! Learning Repository: lung Cancer: breast Cancer: lung Cancer data set predict! Give a comparison between lung cancer dataset uci algorithms or techniques such as Splice dataset FocusM takes many to... Contains one record for each sample, a shape descriptor, fine Scale margin and texture histogram are.... Diagnostic ): from Ljubljana Oncology Institute, 27 not available in the samples 10,,!