Introduction to Bioinformatics

Analyzing Sequence Data with Perl


Prep readings:

In this assignment, we wish to:


Introducing Some New Perl Constructs

  1. To begin, examine the code in the vowels3.pl file. What is different from vowels2.pl? Locate the code in this file which performs the following tasks: Using your book and any on-line resources as a guide, make any additional comments in the code you feel are necessary to explain what the code does. Run the program and confirm your comments. What does the program do when you enter an incorrect menu option?
  2. In order to read data from a file (menu option 2), you need to create a file containing some words. Use Notepad to create a file called words.dat containing some text. Run the vowels3.pl program and select option 2 to read and process this file.
  3. Now, think carefully about each line of code, with the idea that you might cut-and-paste from various sections of this code in the future. For example:
  4. Now, examine the code in the file vowels4.pl.

Let's Process Some DNA Sequences!
  1. Use the link provided above to access the example programs of Chapter 4. Save each of these programs to the subdirectory where you are storing Perl programs. (This will be something like I:\bioinformatics\MyProgs if you are working in a UWF computer lab, or C:\Perl\MyProgs if you are working from home.)
  2. Open your book to Chapter 4 and start reading.
  3. As you encounter the descriptions of these Perl programs in Chapter 4, execute the corresponding Perl program and confirm the output your book suggests.

Think, think, think

Now, again, think carefully about the code you just executed. Are there any changes you should make to improve the readability, code reuse, or the user interface? Assume you won't have time later to make these changes -- develop a good habit of making the changes now, when the code is fresh in your mind. This way you always have code you can trust in future projects. A good way to get a handle on the overall readability of your program is to print out multiple pages on a sheet of paper. In Notepad you can do this by selecting Print-Preferences and then selecting 2 or 4 under "Pages Per Sheet". Click "OK", then "Print." Study the overall structure of your program and specifically double-check that you have been consistent with indentation.


Possible Problems and Solutions


Getting Credit for Your Progress

To get credit for this assignment, you need to complete these additional steps:

  1. Include a link to the Documentation page of the official Perl Website on your course website, under a heading titled, "Perl links."
  2. Create a Perl program called nucleotide-counting2.pl to do some processing of DNA sequences. This menu-driven program should allow the user to enter a single DNA sequence at the keyboard, or specify multiple sequences in a file. The program should count the total number of instances of each of the adenine, cytosine, guanine, and thymine nucleotides in the sequence. Assume the user will specify the sequences using the one-letter abbreviations (A, C, G, and T), but accept both uppercase and lowercase input. If the user enters the menu option to read data from a file, your program should by default try to open and read the file called dna.dat. If this file cannot be opened, then the user should be prompted to specify their own filename. Final results should be displayed to the screen, and written to the default file called results.dat. Your program should be well-written and user friendly, using the guidelines provided here and in class.
  3. Include a link to your new Perl program on your course Webpage, under a heading titled, "Lab: Analysis of Sequence Data (from keyboard or files)."

Congratulations! You just made it through your third assignment! Feel free to check out how the other students in the class did on this assignment.
These pages are optimized for viewing under Netscape.
© Copyright 2003. Melanie A. Sutton, Ph.D. (msutton@uwf.edu) All rights reserved.