Analyzing Sequence Data with Perl


Prep readings:
- Text: Chapter 4; Appendix B
- Files:

In this assignment, we wish to:
- understand additional Perl constructs by adding some new functionality
to our vowel counting program
- create and execute some Perl code related to processing DNA sequences
Introducing Some New Perl Constructs
- To begin, examine the code in the vowels3.pl file. What is different from vowels2.pl?
Locate the code in this file which performs the following tasks:
- Displays a menu for the user
- Gets a menu option from the user
- Checks to make sure the user entered a valid menu option
- Reads input data from a file
Using your book and any on-line resources as a guide, make any additional comments
in the code you feel are necessary to explain what the code does. Run the program
and confirm your comments. What does the program do when you enter an incorrect
menu option?
- In order to read data from a file (menu option 2), you need to create a file
containing some words. Use Notepad to create a file called words.dat
containing some text. Run the vowels3.pl program and select option 2 to read and
process this file.
- Now, think carefully about each line of code, with the idea that you might cut-and-paste
from various sections of this code in the future. For example:
- What is the purpose of the $ready_to_exit variable?
- What happens when you comment out the following line of code and then run the program:
chomp $option;
What purpose does this line of code serve? When might you use chomp on some
other variable?
Note: Like substr, the use of chomp in this program is an
example of using one of Perl's built-in functions. These are constructs in the Perl
language that have already been written for you, to perform various tasks. Information
on these and other built-in functions is available at the official
Perl Website. Just follow the links to
Documentation-Perl's Builtin Functions. Please note that other useful links
at the Perl site include Documentation-Perl Syntax and
Documentation-Perl Diagnostics and Error Messages.
Find these pages now and briefly scan over the information that is available.
- Now, examine the code in the file vowels4.pl.
- What is different now? What additional functionality does this program appear to have?
Is the documentation within the code correct? Is it complete? Why or why not?
- Modify your words.dat file to test the vowels4.pl program.
Sketch pictures of what the variables called @filesentences, $i, and $sentence
contain as you execute the program.
- Consider how this code works on files containing multiple lines of text. What
is the result of uncommenting this line of code in the file:
print @filesentences;
Does this provide any useful output for the programmer, or for the user, or both?
Let's Process Some DNA Sequences!
- Use the link provided above to access the example programs of Chapter 4.
Save each of these
programs to the subdirectory where you are storing Perl programs. (This
will be something like I:\bioinformatics\MyProgs if you are working
in a UWF computer lab, or C:\Perl\MyProgs if you are working from home.)
- Open your book to Chapter 4 and start reading.
- As you encounter the descriptions of these Perl programs in Chapter 4, execute
the corresponding Perl program and confirm the output your book suggests.
Think, think, think
Now, again, think carefully about the code you just executed. Are there any changes
you should make to improve the
readability,
code reuse, or the
user interface?
Assume you won't have time later to make these changes -- develop a good habit
of making the changes now, when the code is fresh in your mind. This way you
always have code you can trust in future projects. A good way to get a handle
on the overall readability of your program is to print out multiple pages on
a sheet of paper. In Notepad you can do this by selecting Print-Preferences
and then selecting 2 or 4 under "Pages Per Sheet". Click "OK", then "Print." Study
the overall structure of your program and specifically double-check that you have been
consistent with indentation.
Possible Problems and Solutions
Check back later for any additions to this section.
Getting Credit for Your Progress
To get credit for this assignment, you need to complete these additional steps:
- Include a link to the Documentation page of the official Perl Website on your course
website, under a heading titled, "Perl links."
- Create a Perl program called nucleotide-counting2.pl to do some processing of DNA sequences.
This menu-driven program should allow the
user to enter a single DNA sequence at the keyboard, or specify multiple sequences
in a file. The program should count the total number of instances of each of
the adenine, cytosine, guanine, and thymine nucleotides in the sequence. Assume
the user will specify the sequences using the one-letter abbreviations (A, C, G, and T),
but accept both uppercase and lowercase input.
If the user enters the menu option to read data from a file, your program should by
default try to open and read the file called dna.dat. If this file
cannot be opened, then the user should be prompted to specify their own filename.
Final results should be displayed to the screen, and written to the default file called
results.dat. Your program should be well-written and user friendly, using
the guidelines provided here and in class.
- Include a link to your new Perl program on your course Webpage, under
a heading titled, "Lab: Analysis of Sequence Data (from keyboard or files)."
Congratulations!
You just made it through your third assignment!
Feel free to check out how the other students
in the class did on this assignment.
These pages are optimized for viewing
under Netscape.
© Copyright 2003.
Melanie A. Sutton, Ph.D.
(msutton@uwf.edu)
All rights reserved.