Biology Forums - Study Force

Biology-Related Homework Help Genetics and Developmental Biology Topic started by: mgvaldes on Jan 20, 2017



Title: Analysis over single chromosome data
Post by: mgvaldes on Jan 20, 2017
I have no expertise in biology, I'm a data scientist, and I would like to know if it makes sense, from the biological point of view, to analyze data (SNP data) coming from a single chromosome, and not all 22 chromosomes, to predict the risk of a certain disease.

Should I obligatorily use data from all chromosomes? Why?

Thank you very much. And sorry if it is a very basic question, but I really would like to understand this.


Title: Re: Analysis over single chromosome data
Post by: bio_man on Jan 21, 2017
Hi,

If the certain diseases is caused by a defective allele found on chromosome 1, then you're doing the right thing.

Could you explain a little bit about your research?


Title: Re: Analysis over single chromosome data
Post by: mgvaldes on Jan 23, 2017
I have two datasets:
  • one from lung cancer with very few patients (170 aprox.) and data from each of the 22 chromosomes, but in separated files.
  • the other from type 2 diabetes with a lot more patients (4000 aprox.) and data from each of the 22 chromosomes, but in separated files.

The main idea behind the project is to apply different machine learning techniques (combining feature selection and classification algorithms) to achieve two main objectives:
  • Create a predictive model for the disiase
  • Identify relevant/important SNPs from the data to further investigate with deeper research, genes related to this SNPs (this is done by biologists, the further research)