Project #5 - Classtime Scraper - 23%
due midnight, U Jul 30

Work in pairs (you choose partners).  Team gets one grade.
 
Deduction for lateness is 10% per day for 5 days, then no credit.  Start early!

version 2.0, last updated 01/26/2008 - new scraper.zip as of 7 pm

This project is intended to be developed in pairs.  I require separate pre- and post- time estimates from each person.

GRADING CRITERION:

  1. 50% scraper functionality
  2. 25% web site functionality
  3. 20% usability, coding practices, etc.
  4. 3% - time pre-estimate
  5. 2% - time post-estimate (should not be identical to pre-)

Using Microsoft Visual C#, recreate Pat's classtime finder.  This involves writing a Windows application which can scrape the registrar's website for either fall, spring or summer semester, and it also involves creating an ASP.NET website front end so users can query the scraped data:

RULES:

  1. Configurable options for the scraper include at least:
  2. The graphical user interface (GUI) of the scraper must remain responsive during the scraping process, and must report progress to the GUI, such as how many sections have been identified so far.
  3. Scraper must be able to be stopped before completion (confirmation is advisable).
  4. By default, your scraper must deposit all scraped information into .csv files on the local disk.  Once the user has determined the scraped .csv files are correct, the scraper can be re-run in a special mode which also deposits the scraped information into a database (either staging or production).
  5. If the scraper is in database-update mode, you'll need to clear all pre-existing data out of the MeetingTimes table before writing new data to it.  The code to do this is:
        string theSQL = "delete from MeetingTimes";
        ConnectionSql.Open();
        System.Data.SqlClient.SqlCommand c1 =
           new System.Data.SqlClient.SqlCommand(theSQL, ConnectionSql);
        c1.ExecuteNonQuery();
        ConnectionSql.Close();
  6. Your scraper should find at least as many sections as my scraper.  Deduction if less, extra credit if you find legitimate sections which I didn't.  Have a look at my readme file for the statistics.
  7. The scraper GUI, when scraping has completed, should display the number of departments, courses and sections found, the URL which was scraped, and the location of the output files.
  8. Please create an ASP.NET application which queries the database with an interface at least as good as the classtime finder website:
  9. Please maintain two copies of the ASP.NET application; the first will pull from a staging database, and the second from the production database.  All changes should first be tested on the staging website and staging database before being copied (after testing) to the production site.  That is, your production site should never be allowed to appear "broken", either due to ASP.NET code or database issues.

RESOURCES and SAMPLE CODE:

  1. The URL's to be scraped are:
  2. Here is my scraper database specification.  You may optionally use identical tables, which I already created for you in each student database on the http://wren.cis.upenn.edu/ server.
  3. Here is Pat's scraper (v5.2); it has the database access part disabled, but you can run it locally and look in c:\reports for the results:
  4. Here's a .zip file of all of my report files (from scraping all three semesters, as dumped into folder c:\reports).  It contains three files for each semester (departments, courses, and sections).
  5. One technique for keeping the GUI responsive (RULE 2) is illustrated in the async delegate sample project that uses a background worker thread (here's its form code).  That is how my scraper does it.

GOALS:

  1. To practice writing a scraper that parses web pages.
  2. To practice writing a Windows application that keeps its GUI responsive while performing intensive background work.
  3. To practice writing an ASP.NET website which accesses a database.

GETTING HELP:
If you need help, try (in preferred order):


TURN-IN:
The turn-in procedure for programming assignments is as follows:

  1. Only one group member should turn in a zipped folder with a readme file that contains at least the following: a) URL's of staging and production websites, b) names of staging and production databases, c) a database specification (if different from mine), and d) anything else we need to know to grade your assignment.  Don't forget to include your separate pre- and post- time estimates!
  2. Create a folder with your combined PennNetID1_PennNetID2 as its name (for example, "pgpalmer_palmer".  Place all needed files (see preceding step) into this zipped folder.
  3. Test unarchiving the zip file to make sure it produces all your files, but in whatever folder the person unzipping chooses (such as, on the Desktop).
  4. If correct, log into Blackboard and turn in the zip file (via file upload in your web browser).  Be sure to select "Send File" (and not "Add File") in Blackboard's Drop Box.

Visitors: Hit Counter