Introduction to Bioinformatics

Introduction to Imaging in Bioinformatics

Processing Images to Detect Breast Cancer

       


Prep readings:

In this assignment, we wish to:


Overview

In this assignment, you will be developing a program for a real-world bioinformatics application, where data storage and speed of performance issues are critical. The particular application you will study with this assignment is the detection of breast cancer. The images above were derived from the on-line Digital Database for Screening Mammography, located at the University of South Florida. These images were scanned from actual X-ray films taken of women being screened for breast cancer. Each patient has four images taken, providing two different views of each breast.

For example, consider the image and link below which indicate a case profile where a patient's right breast tissue contains a spiculated lesion. Click on the case profile and you can review where a radiologist has marked the extent of this abnormality in two different views (you are viewing the RIGHT_CC view below). The radiologist's assessment and any markings on the images are known as the "ground truth" for each case. The DDSM terminology page explains the terms a radiologist uses when he/she finds abnormalities such as this one.


Case profile: C_0031_1.RIGHT_CC

Now let's think about how images are stored in a computer. If we considered a small section of the image above, say 5 rows and 8 columns and examined the data stored in an area of this size, it might look as follows:

      
       160 106 100 102 111 100 127 120
       107 110 110 110 127 242 107 100
       150 110 106 132 213 213 120 148
       100 117 115 108 100 234 210 254
       120 106 111 105 154 222 148 170
For this example, the 5 x 8 subimage contains 40 different locations with various intensity values. The values at each location indicate the effect of the X-rays at each underlying location in the breast tissue. With 8-bit images such as this we have a 0..255 range of possible values at each location. In terms of analyzing X-ray films, intensity values close to 0 appear dark and those closest to 255 appear bright. As you can see from the images above, finding breast cancer means locating areas in the image that are closer to the bright end of the intensity range. In the example subimage provided above, let's assume that the pixels with intensity values greater than 175 represent suspicious regions which the radiologist should examine further.

Across the country, thousands of images just like this are evaluated by clinicians each day, but with image sizes in the range of 5000 x 3000 pixels. Your goal in this assignment is to see how you can use the MATLAB code provided in the file above to develop a tool to help these radiologists. For example, one way you could help the radiologist to quickly locate the suspicious areas, would be to produce an image array containing just two shades of intensities, 0 and 255, where 0 represents a normal area and 255 represents a possible suspicious area. In real applications, this type of output is considered a "region-of-interest" (ROI) or "prompting" image, in that it directs the attention of the radiologists to the areas of the image requiring more careful analysis. This is because 255 stands out better against a bunch of 0 values, compared to the full 0..255 range. Given these objectives for the example subimage provided above, a new subimage could be the following:

       
                0   0   0   0   0   0   0   0
                0   0   0   0   0 255   0   0
                0   0   0   0 255 255   0   0
                0   0   0   0   0 255 255 255
                0   0   0   0   0 255   0   0
How easy is it for you to spot the abnormality in this subimage, compared to the original subimage above?

Getting Started with Bioimaging

  1. For this assignment, you will be using a software package called MATLAB. This software package has been installed for you in the large computer lab (LAN) in Building 79, so your first step is to proceed to that lab, sit down at a computer, and log in to your ArgoNet account. (NOTE: This lab is open 24 hours/day, 7 days/week.)
  2. Once in your ArgoNet account in the Building 79 LAN, start MATLAB (follow links for Start-Programs-MATLAB 6.5-MATLAB 6.5). (NOTE: The value "6.5" represents the MATLAB version number and may vary from semester to semester).
  3. Next, create a folder called bioimaging under the bioinformatics folder you previously created in your ArgoNet account on the Web(I:) drive. Save the following files into this new folder:
  4. Now, in the MATLAB tool window, select File-Set Path. Click on "Add Folder" and use the window to locate the bioimaging foder you just created on Web(I:). When located, click on OK so the MATLAB tool knows where to locate your new files.
  5. Next, use the File-Open menu in MATLAB to open up the bioimaging1.m file you just saved in your account. Read carefully through the provided code, including all comments.
  6. Next, in the right Command Window of MATLAB, at the ">>" prompt, type bioimaging1 and hit return. Four figure windows will open up, on top of each other. These figures contain images that highlight various structures in the original input image that may be useful for a bioinformaticist or clinician examining the original image and the processed results. For example, here are two of the images which can be found in the produced figures:

                    7x7 Filtered Image                     Edges in 25x25 Filtered Image

                       

    Move the figure windows to different parts of the screen and identify the titles at the top of each figure which are labeled as follows:

  7. Now, revisit the Case profile for C_0031_1.RIGHT_CC. Which of the images produced above do you think might help a radiologist to detect the abnormality in the image?
  8. In the MATLAB window containing the code for this assignment, edit the threshold value described in the comments. Save and rerun the MATLAB program. How helpful is the new set of images resulting from this change?

Possible Problems and Solutions


Getting Credit for Your Progress

To get credit for this assignment, you need to complete these additional steps:

  1. Find an interesting image that you would like to do some image processing on. Ideally, this might be an image related to your Term Project. Store the new image in your new bioimaging folder.
  2. Make a copy of the MATLAB program provided above (also in your bioimaging folder) and name the copy my_image1.m
  3. In your new program file, change the name of the input image to the image you would like to process. Save this new program, and then type my_image1 in the MATLAB Command Window to try to run the code with your new image. MATLAB can read many image formats (see the MATLAB Help for a complete listing). If you get any error messages on a color image, try converting it to a grayscale image using a Windows tool such as Paint.
  4. Experiment with the code and threshold values until you produce some images which appear interesting or useful.
  5. Next, use the comments in the code provided to write out two of your favorite processed images. Note that by default, MATLAB will save the images in the directory indicated next to the "Current Directory" label in the main window, so you will need to change this directory to your bioimaging folder before you run the new version of the program.
  6. On your course Webpage, under a heading titled, "Lab 5: Imaging in Bioinformatics", include links to:
  7. E-mail me (msutton@uwf.edu) when you have completed the above steps.

Read more about it...

Wondering how far are you away from the "real world" with this assignment? Check out these products, approved by the FDA in your lifetime (within the last several years, in fact):


Feel free to check out how the other students in the class are doing on the assignments.
These pages are optimized for viewing under Netscape.
© Copyright 2003. Melanie A. Sutton, Ph.D. (msutton@uwf.edu) All rights reserved.