Coders Packet

DNA Sequencing-Bioinformatics using Python

By Cheshta Babbar

This project is called Next Generation Sequencing, a sequencing approach which works by breaking up the DNA into short fragments, replicating them in large quantities using Python programming.

The key to what we do at BLI is called Next Generation Sequencing (NGS), a sequencing approach which works by breaking up the DNA into short fragments, replicating them in large quantities, then sequencing those short fragments in a high-throughput fashion. The output of an NGS sequencer is a large collection of such disjointed sequences, called "reads". For the purpose of this puzzle, each read has two properties:

(1) a start position within the genome and

(2) its length.

Both properties are expressed in base pair units, i.e. letters of DNA like A,T,C,G. For example, the read "AAATCGA" has length 7.

A key component of making sense of NGS data is calculating "coverage" (also known as "read depth") at a given genomic position. At its most basic, coverage is simply the number of reads overlapping a position in the genome. High coverage gives a measure of confidence in the sequencing results, and the calculation of coverage is a critical component of our software systems. In this project, we can do sequencing output to calculate read coverage at a number of positions of interest ("loci").

Download Complete Code

Comments

No comments yet