Friday, October 7th 2011 8:30 PM Posted by Anthony Roy

Project 2: Fri Oct 7th

This week, I used a windows oriented program Karen's Directory Printer to create a text file listing the HTML file names in the order that they were downloaded.(in order of season)

Photobucket

This text file was then divided into seasons which will be used to store the proper season to the dialog entrys to SQL.

Photobucket

Currently tuning a Java HTML parser that I can clear all HTML tags and can return the primary dialog text, that will then be linked to an SQL database. Either as the parser reads each HTML file in or I will have it store into a data sructure, that can be added to the database after all data is organized in Java. The GUI and statsictical data extracted from database are the main focus of attention for this weeks agenda.

Friday, September 30th 2011 7:00 PM Posted by Anthony Roy

Project 2: Fri Sep 30th

This week, I used a Firefox add-on to download the HTML files for all 114 episodes of Futurama. The add-on allowed the download rate to be throttled, thus allowing the files to be listed in seasonal session.

Photobucket

This format will aid in the creation of an ordered list of HTML file names, the HTML parser will then output that data in the order desired

Photobucket

Currently writting parser code that will be used on every episode's HTML file, currently experimenting with Beautiful Soup Website which is a Python HTML parser

Research in SQL will be necessary after all HTML is nice and parsed.