پرش به محتوا

پروژه ۱۰۰۰ ژنوم: تفاوت میان نسخه‌ها

از ویکی‌پدیا، دانشنامهٔ آزاد
محتوای حذف‌شده محتوای افزوده‌شده
P.eldar (بحث | مشارکت‌ها)
بدون خلاصۀ ویرایش
P.eldar (بحث | مشارکت‌ها)
←‏پیشینه: ابرابزار
خط ۱: خط ۱:
{{Mbox
'''پروژه ۱۰۰۰ ژنوم''' یک پروژه تحقیقاتی بین‌المللی برای فراهم کردن یک فهرست جامع و بسیار دقیق از [[تنوع ژنتیکی|تنوّع ژنتیکی]] انسان است. یکی از اهداف مهم این بررسی کشف روابط بین [[فنوتیپ]] (رُخ‌نمود) و [[ژنوتیپ]] (اطلاعات ژنتیکی)، بررسی [[فرگشت|تکامل]] و عوامل تأثیرگذار در بیماری‌های با عامل ژنتیکی در انسان است. این پروژه در سال ۲۰۰۸ توسط [[مؤسسه ملی تحقیقات ژنوم انسان|موسسه ملی تحقیقات ژنوم انسان]] (آمریکا)، [[مؤسسه سَنگِر]] (انگلستان) و [[مؤسسه ژنومیک بجینگ]] (چین) آغاز شد. در این پروژه محققان قصد داشتند [[ژنوم]] حداقل ۱۰۰۰ فرد ناشناس از اقوام مختلف را با استفاده از روش‌های جدید [[توالی‌یابی]] که نسبت به روش‌های قدیمی، سریع‌تر و ارزان‌تر بودند توالی یابی کنند.
| type = notice
| class = ambox-In_use
| image = [[File:Ambox clock.svg|48px|alt=|link=]]
| css = margin: 1px
| text = این {{#ifeq:16 دسامبر|بخش|[[راهنما:بخش|بخش]]|{{#switch:{{NAMESPACE}}
| {{ns:0}} = مقاله
| بحث = [[راهنما:صفحه بحث|صفحه بحث]]
| رده‌بندی = [[ویکی‌پدیا:رده‌بندی|رده]]
| راهنما = [[راهنما:فهرست|فهرست]]
| درگاه = [[ویکی‌پدیا:درگاه|درگاه]]
| الگو = [[ویکی‌پدیا:فضای نام الگو|الگو]]
| کاربر = [[ویکی‌پدیا:صفحه‌های کاربری|صفحه کاربری]]
| بحث کاربر = [[ویکی‌پدیا:صفحه‌های کاربری|صفحه بحث کاربری]]
| ویکی‌پدیا = [[ویکی‌پدیا:فضای نام ویکی‌پدیا|صفحه ویکی‌پدیا]]
| بحث ویکی‌پدیا = [[ویکی‌پدیا:فضای نام ویکی‌پدیا|صفحه بحث ویکی‌پدیا]]
}}}} هم‌اکنون برای {{#ifeq:16 دسامبر|بخش|مدتی کوتاه|16 دسامبر}} '''تحت ویرایش عمده''' است. این برچسب برای جلوگیری از [[راهنما:تعارض ویرایشی|تعارض ویرایشی]] اینجا گذاشته شده‌است، لطفا تا زمانیکه این پیام نمایش داده می‌شود ویرایشی در این {{#ifeq:16 دسامبر|بخش|بخش|صفحه}} انجام ندهید.<br>
<small>{{#if:|این پیام در اضافه شده است.|}} این صفحه آخرین‌بار در {{#time:H:i، j F Y|{{REVISIONTIMESTAMP}}}} ({{کوچک}}ساعت هماهنگ جهانی{{پایان کوچک}}) ({{Time ago|{{REVISIONTIMESTAMP}}}}) تغییر یافته‌است؛ لطفا اگر در چند ساعت اخیر [{{fullurl:{{FULLPAGENAME}}|action=history}} ویرایش نشده است]، این الگو را حذف کنید. اگر شما ویرایشگری هستید که این الگو را اضافه کرده است، لطفا مطمئن شوید آن را حذف یا با {{پیوند الگو|در دست ساخت}} جایگزین می‌کنید.</small>
{{#ifeq:{{{category}}}|no||{{#switch:{{NAMESPACE}}
| کاربر
| بحث کاربر = <!-- no category -->
| [[رده:صفحه‌های سخت در دست ویرایش]]}}}}
}}
'''پروژه ۱۰۰۰ ژنوم''' یک پروژه تحقیقاتی بین‌المللی برای فراهم کردن یک فهرست جامع و بسیار دقیق از [[تنوع ژنتیکی|تنوّع ژنتیکی]] انسان است. یکی از اهداف مهم این بررسی کشف روابط بین [[فنوتیپ]] (رُخ‌نمود) و [[ژنوتیپ]] (اطلاعات ژنتیکی)، بررسی [[فرگشت|تکامل]] و عوامل تأثیرگذار در بیماری‌های با عامل ژنتیکی در انسان است. این پروژه در سال ۲۰۰۸ با همکاری [[مؤسسه ملی تحقیقات ژنوم انسان|موسسه ملی تحقیقات ژنوم انسان]] (آمریکا)، [[مؤسسه سَنگِر]] (انگلستان) و [[مؤسسه ژنومیک بجینگ]] (چین) آغاز شد. در این پروژه محققان قصد داشتند [[ژنوم]] حداقل ۱۰۰۰ فرد ناشناس از اقوام مختلف را با استفاده از روش‌های جدید [[توالی‌یابی]] که نسبت به روش‌های قدیمی، سریع‌تر و ارزان‌تر بودند توالی یابی کنند.


این پروژه در چند بخش و روی ۲۶ [[جامعه ژنتیکی]] مختلف انجام شد. پروژه شامل سه پروژه آزمایشی ابتدایی و یک پروژه اصلی بود که به سه فاز تقسیم می‌شد. پروژه‌های آزمایشی تا ژوئن سال ۲۰۰۹ به پایان رسیدند و فاز نهایی پروژه تا سال ۲۰۱۵ کامل شد. محصول فاز نهایی یک مجموعه دادهٔ تحلیل‌شده شامل ژنوم ۲۵۰۴ نفر است.
این پروژه در چند بخش و روی ۲۶ [[جامعه ژنتیکی]] مختلف انجام شد. پروژه شامل سه پروژه آزمایشی ابتدایی و یک پروژه اصلی بود که به سه فاز تقسیم می‌شد. پروژه‌های آزمایشی تا ژوئن سال ۲۰۰۹ به پایان رسیدند و فاز نهایی پروژه تا سال ۲۰۱۵ کامل شد. محصول فاز نهایی یک مجموعه دادهٔ تحلیل‌شده شامل ژنوم ۲۵۰۴ نفر است.
خط ۸: خط ۳۱:


== پیشینه ==
== پیشینه ==
[[Image:Genetic Variation.jpg|thumb|400px|Changes in the number and order of genes (A-D) create genetic diversity within and between populations.]]

<!-- Here comes the TOC -->
__TOC__
<!-- End of TOC -->

== Background ==
Since the completion of the [[پروژه ژنوم انسان]] advances in human [[ژنتیک جمعیت]] and [[comparative genomics]] have made it possible to gain increasing insight into the nature of genetic diversity.<ref name=neilsen2012>{{Cite journal | last1 = Nielsen | first1 = R. | title = Genomics: In search of rare human variants | doi = 10.1038/4671050a | journal = Nature | volume = 467 | issue = 7319 | pages = 1050–1051 | year = 2010 | pmid = 20981085| pmc =}}</ref> However, we are just beginning to understand how processes like the random sampling of [[گامت]]s, [[جهش]] (insertions/deletions ([[indel]]s), [[Gene copy number|copy number variations]] (CNV), [[رتروترانسپوزون]]), [[چندریختی تک-نوکلئوتید]]s (SNPs), and [[انتخاب طبیعی]] have shaped the level and pattern of variation within [[گونه (زیست‌شناسی)]] and also between species.<ref name="ref2">JC Long, Human Genetic Variation: The mechanisms and results of microevolution, American Anthropological Association (2004)</ref><ref name="ref3">{{Cite journal
| last1 = Anzai | first1 = T.
| last2 = Shiina | first2 = T.
| last3 = Kimura | first3 = N.
| last4 = Yanagiya | first4 = K.
| last5 = Kohara | first5 = S.
| last6 = Shigenari | first6 = A.
| last7 = Yamagata | first7 = T.
| last8 = Kulski | first8 = J. K.
| last9 = Naruse | first9 = T. K.
| last10 = Fujimori | first10 = Y.
| last11 = Fukuzumi | first11 = Y.
| last12 = Yamazaki | first12 = M.
| last13 = Tashiro | first13 = H.
| last14 = Iwamoto | first14 = C.
| last15 = Umehara | first15 = Y.
| last16 = Imanishi | first16 = T.
| last17 = Meyer | first17 = A.
| last18 = Ikeo | first18 = K.
| last19 = Gojobori | first19 = T.
| last20 = Bahram | first20 = S.
| last21 = Inoko | first21 = H.
| title = Comparative sequencing of human and chimpanzee MHC class I regions unveils insertions/deletions as the major path to genomic divergence
| doi = 10.1073/pnas.1230533100
| journal = Proceedings of the National Academy of Sciences
| volume = 100
| issue = 13
| pages = 7708–7713
| year = 2003
| pmid = 12799463
| pmc =164652
}}</ref><ref name="ref4">{{Cite journal | last1 = Redon | first1 = R. | last2 = Ishikawa | first2 = S. | last3 = Fitch | first3 = K. R. | last4 = Feuk | first4 = L. | last5 = Perry | first5 = G. H. | last6 = Andrews | first6 = T. D. | last7 = Fiegler | first7 = H. | last8 = Shapero | first8 = M. H. | last9 = Carson | first9 = A. R. | last10 = Chen | doi = 10.1038/nature05329 | first10 = W. | last11 = Cho | first11 = E. K. | last12 = Dallaire | first12 = S. | last13 = Freeman | first13 = J. L. | last14 = González | first14 = J. R. | last15 = Gratacòs | first15 = M. N. | last16 = Huang | first16 = J. | last17 = Kalaitzopoulos | first17 = D. | last18 = Komura | first18 = D. | last19 = MacDonald | first19 = J. R. | last20 = Marshall | first20 = C. R. | last21 = Mei | first21 = R. | last22 = Montgomery | first22 = L. | last23 = Nishimura | first23 = K. | last24 = Okamura | first24 = K. | last25 = Shen | first25 = F. | last26 = Somerville | first26 = M. J. | last27 = Tchinda | first27 = J. | last28 = Valsesia | first28 = A. | last29 = Woodwark | first29 = C. | last30 = Yang | first30 = F. | title = Global variation in copy number in the human genome | journal = Nature | volume = 444 | issue = 7118 | pages = 444–454 | year = 2006 | pmid = 17122850| pmc =2669898}}</ref><ref name="ref5">{{Cite journal
| last1 = Barreiro | first1 = L. B.
| last2 = Laval | first2 = G.
| last3 = Quach | first3 = H. L. N.
| last4 = Patin | first4 = E.
| last5 = Quintana-Murci | first5 = L. S.
| title = Natural selection has driven population differentiation in modern humans
| doi = 10.1038/ng.78
| journal = Nature Genetics
| volume = 40
| issue = 3
| pages = 340–345
| year = 2008
| pmid = 18246066
| pmc =
}}</ref>

=== Human genetic variation ===
The random sampling of gametes during sexual reproduction leads to [[رانش ژن]] &mdash; a random fluctuation in the population frequency of a trait &mdash; in subsequent generations and would result in the loss of all variation in the absence of external influence. It is postulated that the rate of genetic drift is inversely proportional to population size, and that it may be accelerated in specific situations such as [[گلوگاه جمعیت]], where the population size is reduced for a certain period of time, and by the [[اثر بنیان‌گذار]] (individuals in a population tracing back to a small number of founding individuals).<ref name="ref2" />

Anzai et al. demonstrated that indels account for 90.4% of all observed variations in the sequence of the [[مجموعه سازگاری بافتی اصلی]] (MHC) between [[انسان]] and [[شامپانزه]]. After taking multiple indels into consideration, the high degree of genomic similarity between the two species (98.6% [[توالی اسید نوکلئیک]] identity) drops to only 86.7%. For example, a large deletion of 95 kilobases (kb) between the [[جایگاه کروموزومی]] of the human ''[[MHC class I polypeptide-related sequence A|MICA]]'' and ''[[MICB]]'' [[ژن]], results in a single hybrid chimpanzee ''MIC'' gene, linking this region to a species-specific handling of several [[ویروس پسگرد]] infections and the resultant susceptibility to various [[بیماری خودایمنی]]. The authors conclude that instead of more subtle SNPs, indels were the driving mechanism in primate speciation.<ref name="ref3" />

Besides [[جهش]], SNPs and other [[structural variation|structural variants]] such as [[copy number variation|copy-number variants]] (CNVs) are contributing to the genetic diversity in human populations. Using [[microarrays]], almost 1,500 copy number variable regions, covering around 12% of the genome and containing hundreds of genes, disease loci, functional elements and [[segmental duplication]]s, have been identified in the [[HapMap]] sample collection. Although the specific function of CNVs remains elusive, the fact that CNVs span more nucleotide content per genome than SNPs emphasizes the importance of CNVs in genetic diversity and evolution.<ref name="ref4" />

Investigating human genomic variations holds great potential for identifying genes that might underlie differences in disease resistance (e.g. [[مجموعه سازگاری بافتی اصلی]]) or [[drug metabolism]].<ref name="ref6">{{Cite journal
| last1 = Nielsen | first1 = R.
| last2 = Hellmann | first2 = I.
| last3 = Hubisz | first3 = M.
| last4 = Bustamante | first4 = C.
| last5 = Clark | first5 = A. G.
| title = Recent and ongoing selection in the human genome
| doi = 10.1038/nrg2187
| journal = Nature Reviews Genetics
| volume = 8
| issue = 11
| pages = 857–868
| year = 2007
| pmid = 17943193
| pmc =2933187
}}</ref>

=== Natural selection ===
[[انتخاب طبیعی]] in the [[فرگشت]] of a trait can be divided into three classes. Directional or [[انتخاب هدایتی]] refers to a situation where a certain allele has a greater fitness than other [[الل]], consequently increasing its population frequency (e.g. [[مقاومت آنتی‌بیوتیکی]] of bacteria). In contrast, stabilizing or [[negative selection (natural selection)|negative selection]] (also known as purifying selection) lowers the frequency or even removes alleles from a population due to disadvantages associated with it with respect to other alleles. Finally, a number of forms of [[balancing selection]] exist; those increase genetic variation within a species by being overdominant ([[مانندگی]] individuals are fitter than [[مانندگی]] individuals, e.g. ''[[فاویسم]]'', a gene that is involved in both [[کم‌خونی داسی‌شکل]] and [[مالاریا]] resistance) or can vary spatially within a species that inhabits different niches, thus favouring different alleles.<ref name="ref7">EE Harris et al. , The molecular signature of selection underlying human adaptations, Yearbook of Physical Anthropology 49: 89-130 (2006)</ref> Some genomic differences may not affect fitness. Neutral variation, previously thought to be “junk” DNA, is unaffected by natural selection resulting in higher genetic variation at such sites when compared to sites where variation does influence fitness.<ref name="ref8">{{Cite journal
| last1 = Bamshad | first1 = M.
| last2 = Wooding | first2 = S. P.
| doi = 10.1038/nrg999
| title = Signatures of natural selection in the human genome
| journal = Nature Reviews Genetics
| volume = 4
| issue = 2
| pages = 99–111
| year = 2003
| pmid = 12560807
| pmc =
}}</ref>

It is not fully clear how natural selection has shaped population differences; however, genetic candidate regions under selection have been identified recently.<ref name="ref5" /> Patterns of [[چندریختی (زیست‌شناسی)]] can be used to reliably detect signatures of selection and may help to identify genes that might underlie variation in disease resistance or drug metabolism.<ref name="ref7" /><ref name="ref8" /> Barreiro et al. found evidence that negative selection has reduced population differentiation at the [[اسید آمینه]]–altering level (particularly in disease-related genes), whereas, positive selection has ensured regional adaptation of human populations by increasing population differentiation in gene regions (mainly [[جهش بدمعنی]] and [[Five prime untranslated region|5'-untranslated region]] variants).<ref name="ref5" />

It is thought that most [[اختلال ژنتیکی]] and [[قوانین مندل]] (except diseases with late onset, assuming that older individuals no longer contribute to the fitness of their offspring) will have an effect on survival and/or reproduction, thus, genetic factors underlying those diseases should be influenced by natural selection. Although, diseases that have late onset today could have been childhood diseases in the past as genes delaying disease progression could have undergone selection. [[بیماری گوشه]] (mutations in the ''[[Glucocerebrosidase|GBA]]'' gene), [[بیماری کرون]] (mutation of ''[[NOD2]]'') and [[کاردیومیوپاتی]] (mutations in ''[[MYH7]]'', ''[[TNNT2]]'', ''[[TPM1]]'' and ''[[MYBPC3]]'') are all examples of negative selection. These disease mutations are primarily recessive and segregate as expected at a low frequency, supporting the hypothesized negative selection. There is evidence that the genetic-basis of [[دیابت نوع یک]] may have undergone positive selection.<ref>{{Cite journal | last1 = Corona | first1 = E. | last2 = Dudley | first2 = J. T. | last3 = Butte | first3 = A. J. | editor1-last = Hawks | editor1-first = John | title = Extreme Evolutionary Disparities Seen in Positive Selection across Seven Complex Diseases | doi = 10.1371/journal.pone.0012236 | journal = PLoS ONE | volume = 5 | issue = 8 | pages = e12236 | year = 2010 | pmid = 20808933| pmc =2923198}}</ref> Few cases have been reported, where disease-causing mutations appear at the high frequencies supported by balanced selection. The most prominent example is mutations of the ''G6PD'' locus where, if homozygous G6PD [[آنزیم]] deficiency and consequently [[کم‌خونی داسی‌شکل]] results, but in the heterozygous state are partially protective against [[مالاریا]]. Other possible explanations for segregation of disease alleles at moderate or high frequencies include genetic drift and recent alterations towards positive selection due to environmental changes such as diet or [[Genetic hitchhiking|genetic hitch-hiking]].<ref name="ref6" />

[[ریزآرایه دی‌ان‌ای]] of different human populations, as well as between species (e.g. human versus chimpanzee) are helping us to understand the relationship between diseases and selection and provide evidence of mutations in constrained genes being disproportionally associated with [[اختلال ژنتیکی]] [[فنوتیپ]]. Genes implicated in complex disorders tend to be under less negative selection than Mendelian disease genes or non-disease genes.<ref name="ref6" />

== Project description ==
=== Goals ===
There are two kinds of genetic variants related to disease. The first are rare genetic variants that have a severe effect predominantly on simple traits (e.g. [[فیبروز سیستیک]], [[بیماری هانتینگتون]]). The second, more common, genetic variants have a mild effect and are thought to be implicated in complex traits (e.g. [[شناخت]], [[دیابت]], [[بیماری قلبی-عروقی]]). Between these two types of genetic variants lies a significant gap of knowledge, which the 1000 Genomes Project is designed to address.<ref name="ref1" />

The primary goal of this project is to create a complete and detailed catalogue of [[human genetic variation]]s, which in turn can be used for [[association studies]] relating genetic variation to disease. By doing so the consortium aims to discover >95 % of the variants (e.g. SNPs, CNVs, indels) with [[minor allele frequency|minor allele frequencies]] as low as 1% across the genome and 0.1-0.5% in gene regions, as well as to estimate the population frequencies, [[هاپلوتیپ]] backgrounds and [[linkage disequilibrium]] patterns of variant alleles.<ref name="ref9">Meeting Report: A Workshop to Plan a Deep Catalog of Human Genetic Variation, (2007) http://www.1000genomes.org/sites/1000genomes.org/files/docs/1000Genomes-MeetingReport.pdf</ref>

Secondary goals will include the support of better SNP and probe selection for [[تعیین ساختار ژنتیکی]] platforms in future studies and the improvement of the [[پروژه ژنوم انسان]]. Furthermore, the completed database will be a useful tool for studying regions under selection, variation in multiple populations and understanding the underlying processes of mutation and [[نوترکیبی ژنی]].<ref name="ref9" />

=== Outline ===
The [[human genome]] consists of approximately 3 billion DNA base pairs and is estimated to carry around 20,000 [[پروتئین]] coding [[ژن]]. In designing the study the consortium needed to address several critical issues regarding the project metrics such as technology challenges, data quality standards and sequence coverage.<ref name="ref9" />

Over the course of the next three years,{{نیازمند شفاف‌سازی|date=April 2015}} scientists at the [[Sanger Institute]], [[Beijing Genomics Institute|BGI Shenzhen]] and the [[مؤسسه ملی تحقیقات ژنوم انسان]]’s Large-Scale Sequencing Network are planning to sequence a minimum of 1,000 human genomes. Due to the large amount of sequence data that need to be generated and analyzed it is possible that other participants may be recruited over time.<ref name="ref1" />

Almost 10 billion bases will be sequenced per day over a period of the two year production phase. This equates to more than two human genomes every 24 hours; a groundbreaking capacity. Challenging the leading experts of [[بیوانفورماتیک]] and statistical genetics, the sequence dataset will comprise 6 trillion DNA bases, 60-fold more sequence data than what has been published in [[دی‌ان‌ای]] databases over the past 25 years.<ref name="ref1" />

To determine the final design of the full project three pilot studies were designed and will be carried out within the first year of the project. The first pilot intends to genotype 180 people of 3 [[جمعیت جهان]] at low coverage (2x). For the second pilot study,
the genomes of two nuclear families (both parents and an adult child) are going to be sequenced with deep coverage (20x per genome). The third pilot study involves sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).<ref name="ref1" /><ref name="ref9" />

It has been estimated that the project would likely cost more than $500 million if standard DNA sequencing technologies were used. Therefore, several new technologies (e.g. [[ایلومینا]], [[454 Life Sciences|454]], [[ABI Solid Sequencing|SOLiD]]) will be applied, lowering the expected costs to between $30 million and $50 million. The major support will be provided by the [[Wellcome Trust Sanger Institute]] in Hinxton, England; the [[Beijing Genomics Institute]], Shenzhen (BGI Shenzhen), China; and the [[مؤسسه ملی تحقیقات ژنوم انسان]], part of the National Institutes of Health (NIH).<ref name="ref1" />

In keeping with [http://www.genome.gov/pages/research/wellcomereport0303.pdf Fort Lauderdale principles], all genome sequence data (including variant calls) is freely available as the project progresses and can be downloaded via ftp from the [http://www.1000genomes.org/data 1000 genomes project webpage].

=== Human genome samples ===
Based on the overall goals for the project, the samples will be chosen to provide power in populations where [[association studies]] for common diseases are being carried out. Furthermore, the samples do not need to have medical or phenotype information since the proposed catalogue will be a basic resource on human variation.<ref name="ref9" />

For the pilot studies human genome samples from the [[HapMap]] collection will be sequenced. It will be useful to focus on samples that have additional data available (such as [[ENCODE]] sequence, genome-wide genotypes, [[fosmid]]-end sequence, structural variation assays, and [[بیان ژن]]) to be able to compare the results with those from other projects.<ref name="ref9" />

Complying with extensive ethical procedures, the 1000 Genomes Project will then use samples from volunteer donors. The following populations will be included in the study: [[یوروبا]] in [[ایبادان]] (YRI), [[نیجریه]]; [[مردم ژاپنی]] in [[توکیو]] (JPT); [[Chinese people|Chinese]] in [[پکن]] (CHB); [[یوتا]] residents with ancestry from northern and western [[اروپا]] (CEU); [[Luhya people|Luhya]] in [[وبویه، کنیا]], [[کنیا]] (LWK); [[ماسای]] in Kinyawa, Kenya (MKK); Toscani in [[ایتالیا]] (TSI); Peruvians in [[لیما]], [[پرو]] (PEL); Gujarati Indians in [[هیوستون]] (GIH); Chinese in metropolitan [[دنور، کلرادو]] (CHD); people of [[Mexican people|Mexican]] ancestry in [[لس آنجلس]] (MXL); and people of [[:رده:اهالی آفریقا]] ancestry in the southwestern [[ایالات متحده آمریکا]] (ASW).<ref name="ref1" />

=== Community meeting ===
Data generated by the 1000 Genomes Project is widely used by the genetics community, making the first 1000 Genomes Project one of the most cited papers in biology.<ref name="hotpapers">C. King (2012) The Hottest Research of 2011. ''Science Watch'' http://archive.sciencewatch.com/newsletter/2012/201203/hottest_research_2012/</ref> To support this user community, the project held a community analysis meeting in July 2012 that included talks highlighting key project discoveries, their impact on population genetics and human disease studies, and summaries of other large scale sequencing studies.<ref name="community">1000 Genomes Project Community Analysis Meeting http://1000gconference.sph.umich.edu/</ref>

== Project findings ==
=== Pilot phase ===
The pilot phase consisted of three projects:
* low-coverage whole-genome sequencing of 179 individuals from 4 populations
* high-coverage sequencing of 2 trios (mother-father-child)
* exon-targeted sequencing of 697 individuals from 7 populations
It was found that on average, each person carries around 250-300 loss-of-function variants in annotated genes and 50-100 variants previously implicated in inherited disorders. Based on the two trios, it is estimated that the rate of de novo germline mutation is approximately 10<sup>−8</sup> per base per generation.<ref name="Pilot phase" />

== جستارهای وابسته ==
{{درگاه|Biology|Molecular and cellular biology}}
* [[پروژه ژنوم انسان]]
* [[HapMap Project]]
* [[Personal genomics]]
* [[Population groups in biomedicine]]
* [[1000 Plant Genomes Project]]
* [[List of biological databases]]

== منابع ==
{{پانویس|30em}}

== پیوند به بیرون ==
* [http://www.1000genomes.org/ 1000 Genomes] - A Deep Catalog of Human Genetic Variation - official web page
* [http://www.hapmap.org/ International HapMap Project] - official web page
* [http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml Human Genome Project Information]

{{Wellcome Trust}}
{{Personal genomics}}

[[رده:پروژه‌های ژنوم انسانی]]
[[رده:چندریختی‌های تک-نوکلئوتید]]

نسخهٔ ‏۱۵ دسامبر ۲۰۱۶، ساعت ۱۰:۴۲

پروژه ۱۰۰۰ ژنوم یک پروژه تحقیقاتی بین‌المللی برای فراهم کردن یک فهرست جامع و بسیار دقیق از تنوّع ژنتیکی انسان است. یکی از اهداف مهم این بررسی کشف روابط بین فنوتیپ (رُخ‌نمود) و ژنوتیپ (اطلاعات ژنتیکی)، بررسی تکامل و عوامل تأثیرگذار در بیماری‌های با عامل ژنتیکی در انسان است. این پروژه در سال ۲۰۰۸ با همکاری موسسه ملی تحقیقات ژنوم انسان (آمریکا)، مؤسسه سَنگِر (انگلستان) و مؤسسه ژنومیک بجینگ (چین) آغاز شد. در این پروژه محققان قصد داشتند ژنوم حداقل ۱۰۰۰ فرد ناشناس از اقوام مختلف را با استفاده از روش‌های جدید توالی‌یابی که نسبت به روش‌های قدیمی، سریع‌تر و ارزان‌تر بودند توالی یابی کنند.

این پروژه در چند بخش و روی ۲۶ جامعه ژنتیکی مختلف انجام شد. پروژه شامل سه پروژه آزمایشی ابتدایی و یک پروژه اصلی بود که به سه فاز تقسیم می‌شد. پروژه‌های آزمایشی تا ژوئن سال ۲۰۰۹ به پایان رسیدند و فاز نهایی پروژه تا سال ۲۰۱۵ کامل شد. محصول فاز نهایی یک مجموعه دادهٔ تحلیل‌شده شامل ژنوم ۲۵۰۴ نفر است.

در حین پیشرفت پروژه و با پایان‌یافتن هر فاز، داده‌ها برای عموم منتشر و نتایج تحقیقات در مجله Nature چاپ می‌شد. علاوه بر این، بسیاری از مقالات در مجله نیچر به کاربردهای پروژه ۱۰۰۰ ژنوم در ژنتیک، پزشکی و داروسازی اختصاص یافت.

این پروژه با فراهم‌کردن یک نمای کلّی از تنوّع ژنتیکی انسان که در آزمایش‌ها پیشین قابل دست‌یابی نبودند، توانست به یک ابزار بسیار ارزشمند و کارا برای بسیاری از زمینه‌های علوم زیستی از جمله ژنتیک، پزشکی، داروسازی و بیوانفورماتیک تبدیل شود.

پیشینه

Changes in the number and order of genes (A-D) create genetic diversity within and between populations.

Background

Since the completion of the پروژه ژنوم انسان advances in human ژنتیک جمعیت and comparative genomics have made it possible to gain increasing insight into the nature of genetic diversity.[۱] However, we are just beginning to understand how processes like the random sampling of گامتs, جهش (insertions/deletions (indels), copy number variations (CNV), رتروترانسپوزون), چندریختی تک-نوکلئوتیدs (SNPs), and انتخاب طبیعی have shaped the level and pattern of variation within گونه (زیست‌شناسی) and also between species.[۲][۳][۴][۵]

Human genetic variation

The random sampling of gametes during sexual reproduction leads to رانش ژن — a random fluctuation in the population frequency of a trait — in subsequent generations and would result in the loss of all variation in the absence of external influence. It is postulated that the rate of genetic drift is inversely proportional to population size, and that it may be accelerated in specific situations such as گلوگاه جمعیت, where the population size is reduced for a certain period of time, and by the اثر بنیان‌گذار (individuals in a population tracing back to a small number of founding individuals).[۲]

Anzai et al. demonstrated that indels account for 90.4% of all observed variations in the sequence of the مجموعه سازگاری بافتی اصلی (MHC) between انسان and شامپانزه. After taking multiple indels into consideration, the high degree of genomic similarity between the two species (98.6% توالی اسید نوکلئیک identity) drops to only 86.7%. For example, a large deletion of 95 kilobases (kb) between the جایگاه کروموزومی of the human MICA and MICB ژن, results in a single hybrid chimpanzee MIC gene, linking this region to a species-specific handling of several ویروس پسگرد infections and the resultant susceptibility to various بیماری خودایمنی. The authors conclude that instead of more subtle SNPs, indels were the driving mechanism in primate speciation.[۳]

Besides جهش, SNPs and other structural variants such as copy-number variants (CNVs) are contributing to the genetic diversity in human populations. Using microarrays, almost 1,500 copy number variable regions, covering around 12% of the genome and containing hundreds of genes, disease loci, functional elements and segmental duplications, have been identified in the HapMap sample collection. Although the specific function of CNVs remains elusive, the fact that CNVs span more nucleotide content per genome than SNPs emphasizes the importance of CNVs in genetic diversity and evolution.[۴]

Investigating human genomic variations holds great potential for identifying genes that might underlie differences in disease resistance (e.g. مجموعه سازگاری بافتی اصلی) or drug metabolism.[۶]

Natural selection

انتخاب طبیعی in the فرگشت of a trait can be divided into three classes. Directional or انتخاب هدایتی refers to a situation where a certain allele has a greater fitness than other الل, consequently increasing its population frequency (e.g. مقاومت آنتی‌بیوتیکی of bacteria). In contrast, stabilizing or negative selection (also known as purifying selection) lowers the frequency or even removes alleles from a population due to disadvantages associated with it with respect to other alleles. Finally, a number of forms of balancing selection exist; those increase genetic variation within a species by being overdominant (مانندگی individuals are fitter than مانندگی individuals, e.g. فاویسم, a gene that is involved in both کم‌خونی داسی‌شکل and مالاریا resistance) or can vary spatially within a species that inhabits different niches, thus favouring different alleles.[۷] Some genomic differences may not affect fitness. Neutral variation, previously thought to be “junk” DNA, is unaffected by natural selection resulting in higher genetic variation at such sites when compared to sites where variation does influence fitness.[۸]

It is not fully clear how natural selection has shaped population differences; however, genetic candidate regions under selection have been identified recently.[۵] Patterns of چندریختی (زیست‌شناسی) can be used to reliably detect signatures of selection and may help to identify genes that might underlie variation in disease resistance or drug metabolism.[۷][۸] Barreiro et al. found evidence that negative selection has reduced population differentiation at the اسید آمینه–altering level (particularly in disease-related genes), whereas, positive selection has ensured regional adaptation of human populations by increasing population differentiation in gene regions (mainly جهش بدمعنی and 5'-untranslated region variants).[۵]

It is thought that most اختلال ژنتیکی and قوانین مندل (except diseases with late onset, assuming that older individuals no longer contribute to the fitness of their offspring) will have an effect on survival and/or reproduction, thus, genetic factors underlying those diseases should be influenced by natural selection. Although, diseases that have late onset today could have been childhood diseases in the past as genes delaying disease progression could have undergone selection. بیماری گوشه (mutations in the GBA gene), بیماری کرون (mutation of NOD2) and کاردیومیوپاتی (mutations in MYH7, TNNT2, TPM1 and MYBPC3) are all examples of negative selection. These disease mutations are primarily recessive and segregate as expected at a low frequency, supporting the hypothesized negative selection. There is evidence that the genetic-basis of دیابت نوع یک may have undergone positive selection.[۹] Few cases have been reported, where disease-causing mutations appear at the high frequencies supported by balanced selection. The most prominent example is mutations of the G6PD locus where, if homozygous G6PD آنزیم deficiency and consequently کم‌خونی داسی‌شکل results, but in the heterozygous state are partially protective against مالاریا. Other possible explanations for segregation of disease alleles at moderate or high frequencies include genetic drift and recent alterations towards positive selection due to environmental changes such as diet or genetic hitch-hiking.[۶]

ریزآرایه دی‌ان‌ای of different human populations, as well as between species (e.g. human versus chimpanzee) are helping us to understand the relationship between diseases and selection and provide evidence of mutations in constrained genes being disproportionally associated with اختلال ژنتیکی فنوتیپ. Genes implicated in complex disorders tend to be under less negative selection than Mendelian disease genes or non-disease genes.[۶]

Project description

Goals

There are two kinds of genetic variants related to disease. The first are rare genetic variants that have a severe effect predominantly on simple traits (e.g. فیبروز سیستیک, بیماری هانتینگتون). The second, more common, genetic variants have a mild effect and are thought to be implicated in complex traits (e.g. شناخت, دیابت, بیماری قلبی-عروقی). Between these two types of genetic variants lies a significant gap of knowledge, which the 1000 Genomes Project is designed to address.[۱۰]

The primary goal of this project is to create a complete and detailed catalogue of human genetic variations, which in turn can be used for association studies relating genetic variation to disease. By doing so the consortium aims to discover >95 % of the variants (e.g. SNPs, CNVs, indels) with minor allele frequencies as low as 1% across the genome and 0.1-0.5% in gene regions, as well as to estimate the population frequencies, هاپلوتیپ backgrounds and linkage disequilibrium patterns of variant alleles.[۱۱]

Secondary goals will include the support of better SNP and probe selection for تعیین ساختار ژنتیکی platforms in future studies and the improvement of the پروژه ژنوم انسان. Furthermore, the completed database will be a useful tool for studying regions under selection, variation in multiple populations and understanding the underlying processes of mutation and نوترکیبی ژنی.[۱۱]

Outline

The human genome consists of approximately 3 billion DNA base pairs and is estimated to carry around 20,000 پروتئین coding ژن. In designing the study the consortium needed to address several critical issues regarding the project metrics such as technology challenges, data quality standards and sequence coverage.[۱۱]

Over the course of the next three years,[نیازمند شفاف‌سازی] scientists at the Sanger Institute, BGI Shenzhen and the مؤسسه ملی تحقیقات ژنوم انسان’s Large-Scale Sequencing Network are planning to sequence a minimum of 1,000 human genomes. Due to the large amount of sequence data that need to be generated and analyzed it is possible that other participants may be recruited over time.[۱۰]

Almost 10 billion bases will be sequenced per day over a period of the two year production phase. This equates to more than two human genomes every 24 hours; a groundbreaking capacity. Challenging the leading experts of بیوانفورماتیک and statistical genetics, the sequence dataset will comprise 6 trillion DNA bases, 60-fold more sequence data than what has been published in دی‌ان‌ای databases over the past 25 years.[۱۰]

To determine the final design of the full project three pilot studies were designed and will be carried out within the first year of the project. The first pilot intends to genotype 180 people of 3 جمعیت جهان at low coverage (2x). For the second pilot study, the genomes of two nuclear families (both parents and an adult child) are going to be sequenced with deep coverage (20x per genome). The third pilot study involves sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).[۱۰][۱۱]

It has been estimated that the project would likely cost more than $500 million if standard DNA sequencing technologies were used. Therefore, several new technologies (e.g. ایلومینا, 454, SOLiD) will be applied, lowering the expected costs to between $30 million and $50 million. The major support will be provided by the Wellcome Trust Sanger Institute in Hinxton, England; the Beijing Genomics Institute, Shenzhen (BGI Shenzhen), China; and the مؤسسه ملی تحقیقات ژنوم انسان, part of the National Institutes of Health (NIH).[۱۰]

In keeping with Fort Lauderdale principles, all genome sequence data (including variant calls) is freely available as the project progresses and can be downloaded via ftp from the 1000 genomes project webpage.

Human genome samples

Based on the overall goals for the project, the samples will be chosen to provide power in populations where association studies for common diseases are being carried out. Furthermore, the samples do not need to have medical or phenotype information since the proposed catalogue will be a basic resource on human variation.[۱۱]

For the pilot studies human genome samples from the HapMap collection will be sequenced. It will be useful to focus on samples that have additional data available (such as ENCODE sequence, genome-wide genotypes, fosmid-end sequence, structural variation assays, and بیان ژن) to be able to compare the results with those from other projects.[۱۱]

Complying with extensive ethical procedures, the 1000 Genomes Project will then use samples from volunteer donors. The following populations will be included in the study: یوروبا in ایبادان (YRI), نیجریه; مردم ژاپنی in توکیو (JPT); Chinese in پکن (CHB); یوتا residents with ancestry from northern and western اروپا (CEU); Luhya in وبویه، کنیا, کنیا (LWK); ماسای in Kinyawa, Kenya (MKK); Toscani in ایتالیا (TSI); Peruvians in لیما, پرو (PEL); Gujarati Indians in هیوستون (GIH); Chinese in metropolitan دنور، کلرادو (CHD); people of Mexican ancestry in لس آنجلس (MXL); and people of رده:اهالی آفریقا ancestry in the southwestern ایالات متحده آمریکا (ASW).[۱۰]

Community meeting

Data generated by the 1000 Genomes Project is widely used by the genetics community, making the first 1000 Genomes Project one of the most cited papers in biology.[۱۲] To support this user community, the project held a community analysis meeting in July 2012 that included talks highlighting key project discoveries, their impact on population genetics and human disease studies, and summaries of other large scale sequencing studies.[۱۳]

Project findings

Pilot phase

The pilot phase consisted of three projects:

  • low-coverage whole-genome sequencing of 179 individuals from 4 populations
  • high-coverage sequencing of 2 trios (mother-father-child)
  • exon-targeted sequencing of 697 individuals from 7 populations

It was found that on average, each person carries around 250-300 loss-of-function variants in annotated genes and 50-100 variants previously implicated in inherited disorders. Based on the two trios, it is estimated that the rate of de novo germline mutation is approximately 10−8 per base per generation.[۱۴]

جستارهای وابسته

منابع

  1. Nielsen, R. (2010). "Genomics: In search of rare human variants". Nature. 467 (7319): 1050–1051. doi:10.1038/4671050a. PMID 20981085.
  2. ۲٫۰ ۲٫۱ JC Long, Human Genetic Variation: The mechanisms and results of microevolution, American Anthropological Association (2004)
  3. ۳٫۰ ۳٫۱ Anzai, T.; Shiina, T.; Kimura, N.; Yanagiya, K.; Kohara, S.; Shigenari, A.; Yamagata, T.; Kulski, J. K.; Naruse, T. K.; Fujimori, Y.; Fukuzumi, Y.; Yamazaki, M.; Tashiro, H.; Iwamoto, C.; Umehara, Y.; Imanishi, T.; Meyer, A.; Ikeo, K.; Gojobori, T.; Bahram, S.; Inoko, H. (2003). "Comparative sequencing of human and chimpanzee MHC class I regions unveils insertions/deletions as the major path to genomic divergence". Proceedings of the National Academy of Sciences. 100 (13): 7708–7713. doi:10.1073/pnas.1230533100. PMC 164652. PMID 12799463.
  4. ۴٫۰ ۴٫۱ Redon, R.; Ishikawa, S.; Fitch, K. R.; Feuk, L.; Perry, G. H.; Andrews, T. D.; Fiegler, H.; Shapero, M. H.; Carson, A. R.; Chen, W.; Cho, E. K.; Dallaire, S.; Freeman, J. L.; González, J. R.; Gratacòs, M. N.; Huang, J.; Kalaitzopoulos, D.; Komura, D.; MacDonald, J. R.; Marshall, C. R.; Mei, R.; Montgomery, L.; Nishimura, K.; Okamura, K.; Shen, F.; Somerville, M. J.; Tchinda, J.; Valsesia, A.; Woodwark, C.; Yang, F. (2006). "Global variation in copy number in the human genome". Nature. 444 (7118): 444–454. doi:10.1038/nature05329. PMC 2669898. PMID 17122850.
  5. ۵٫۰ ۵٫۱ ۵٫۲ Barreiro, L. B.; Laval, G.; Quach, H. L. N.; Patin, E.; Quintana-Murci, L. S. (2008). "Natural selection has driven population differentiation in modern humans". Nature Genetics. 40 (3): 340–345. doi:10.1038/ng.78. PMID 18246066.
  6. ۶٫۰ ۶٫۱ ۶٫۲ Nielsen, R.; Hellmann, I.; Hubisz, M.; Bustamante, C.; Clark, A. G. (2007). "Recent and ongoing selection in the human genome". Nature Reviews Genetics. 8 (11): 857–868. doi:10.1038/nrg2187. PMC 2933187. PMID 17943193.
  7. ۷٫۰ ۷٫۱ EE Harris et al. , The molecular signature of selection underlying human adaptations, Yearbook of Physical Anthropology 49: 89-130 (2006)
  8. ۸٫۰ ۸٫۱ Bamshad, M.; Wooding, S. P. (2003). "Signatures of natural selection in the human genome". Nature Reviews Genetics. 4 (2): 99–111. doi:10.1038/nrg999. PMID 12560807.
  9. Corona, E.; Dudley, J. T.; Butte, A. J. (2010). Hawks, John (ed.). "Extreme Evolutionary Disparities Seen in Positive Selection across Seven Complex Diseases". PLoS ONE. 5 (8): e12236. doi:10.1371/journal.pone.0012236. PMC 2923198. PMID 20808933.
  10. ۱۰٫۰ ۱۰٫۱ ۱۰٫۲ ۱۰٫۳ ۱۰٫۴ ۱۰٫۵ خطای یادکرد: خطای یادکرد:برچسب <ref>‎ غیرمجاز؛ متنی برای یادکردهای با نام ref1 وارد نشده است. (صفحهٔ راهنما را مطالعه کنید.).
  11. ۱۱٫۰ ۱۱٫۱ ۱۱٫۲ ۱۱٫۳ ۱۱٫۴ ۱۱٫۵ Meeting Report: A Workshop to Plan a Deep Catalog of Human Genetic Variation, (2007) http://www.1000genomes.org/sites/1000genomes.org/files/docs/1000Genomes-MeetingReport.pdf
  12. C. King (2012) The Hottest Research of 2011. Science Watch http://archive.sciencewatch.com/newsletter/2012/201203/hottest_research_2012/
  13. 1000 Genomes Project Community Analysis Meeting http://1000gconference.sph.umich.edu/
  14. خطای یادکرد: خطای یادکرد:برچسب <ref>‎ غیرمجاز؛ متنی برای یادکردهای با نام Pilot phase وارد نشده است. (صفحهٔ راهنما را مطالعه کنید.).

پیوند به بیرون

الگو:Wellcome Trust الگو:Personal genomics