LIS 7008: Information Technologies and Systems (Spring 2017, Section 01)

LIS 7008 - Information Technologies and Systems
Spring 2017 - Section 01
Assignment 2

This homework is due on your course website before the beginning of next class session. Partial credit may be awarded. Please be advised to test your Web account with FileZilla as early as possible. Do not wait until the last minute to upload your homework file in case there is an issue.

As a reference librarian or information user, sometimes you want to find authoritative information on the Web for users or for yourself. The purpose of this exercise is to learn the techniques for identifying who is responsible for the content of a Web page.

I have encountered an interesting Web page at http://www.flyonthewall.tv/casestudies.php?site=5, and would like to know more about the people or the organization that are responsible for the content on this site. We know at least five ways to find out who really runs a site, so lets give them a try:

By following links from that Web page, see if you can find a page on the same site that makes a claim of organizational or individual responsibility for the content on the site.
Sometimes no appropriate links are provided. In such cases, URL trimming sometimes offers a way of finding a page on which a claim of responsibility is made. The idea is to remove parts of the URL starting at the right until you get to a page where such a claim is made. For example, the Web page for section 01 of this course is http://www.csc.lsu.edu/~wuyj/Teaching/7008/sp17/. URL trimming would eventually get you back to http://www.csc.lsu.edu/~wuyj, where you would be redirected to my home page. Overtrimming to http://www.csc.lsu.edu would be less useful in this case, since the CSC server hosts unrelated information from many people.
Sometimes it is not possible to find anything that resembles a claim of responsibility, and sometimes that claim may be misleading (for example, if you found a Web page from the "Committee to Re-Elect the President," you might want to know something more about that organization). One way to do that is to look at the domain name registry to see where the domain name is registered. Sometimes you will find the full domain name registered, other time you may find that only a part of the name is registered. In this case, you want to trim the URL from the right until you get to the domain name, and then trim the domain name from the left ("www.umiacs.umd.edu" would become "umiacs.umd.edu" and then "umd.edu"). Many online tools can look up the registrant of a domain name, such as http://www.networksolutions.com/whois/index.jsp, http://whois.icann.org/en, and http://mxtoolbox.com/.
Some top-level domain names are assigned to organizations (the U.S. government owns ".gov," for example) or to countries (the United Kingdom owns ".uk"). So in this case it would be useful to know who owns ".tv". If you do much of this, you will learn to recognize some of the more common top-level domain names. There are a lot of domain name services (DNS) that provide this sort of information; one can be found (with some poking around) at http://www.iana.org/.
Ultimately, the packets that you send to a host have to know how to get there. You can follow that path using a "traceroute" service. One such service is available at http://whatismyipaddress.com/traceroute-tool. You need to type in an IP address in the search box. You can find the IP address of a Website using many online tools, such as site24x7.com and ininfo.info. The traceroute service will provide quite a lot of detail on how packets get from the server that hosts whatismyipaddress.com to any site you specify. Note that you can use other traceroute tools (such as the ones introduced in the slides) that you find useful.

The homework assignment is to use all of these techniques to determine who is responsible for the content that you see on the site given above. Describe what you find using each of the five techniques in an html file (name it as FirstName_LastName_hw2.html, such as John_Smith_hw2.html), also discuss possible causes for the "inconsistencies" that you discover.

To help me read your solution, please use the following structure to craft your report:

Task 1.
Task 2.
Task 3.
Task 4.
Task 5.
Discussion of findings.

In other words, please use at least 6 paragraphs in your report.

Post the html file on your web site, then email the instructor the URL for accessing your html file. Do not revise that file for 3 days (after the due date). Reminder: your URL is in this format: http://classes.slis.lsu.edu/wu/7008/sp17/your_folder/FirstName_LastName_hw2.html where your_folder is your first initial followed by your last name, all in lower case, such as jsmith for John Smith. We use your official first and last name in the class roster to create your folder on the SLIS Web server.

Hope everything is clear. Some students might still have no clue what they are supposed to do. Again, two tasks need to be finished:

Suppose you are an FBI agent and you are given an assignment to investigate who is responsible for the content of that Website. You can use multiple techniques (discussed above AND in the slides) to find out the results, which may or may not be consistent with each other. However, you can make a cogent story from those results. Librarians do this "information resource authority check" for information users often, too. You need to read the slides carefully. Specific techniques are discussed in the slides. Note that this homework is an exploratory task. You may discover different things from different venues, just like the way FRI agents do.
You are supposed to write an HTML file to record your work, then upload that HTML file onto our class Web server, then email me the URL for accessing your HTML file. In other words, do NOT submit a .txt (or .pdf, .doc) file. The URL here starts with http://classes.slis.lsu.edu/wu/7008/sp17/...; do NOT submit the location of a file on your local computer starting with C://...

Some students had some difficulty using FileZilla to upload the html file to the class Web server, or using a browser to render the html file. Here are some common problems and solutions.

Problem: FileZilla cannot be clicked and initiated.
Solution: Make sure you have installed the right version of FileZilla for your operating system (Windows or Mac). FileZilla is available from Tigerware and the Web.
Problem: unable to connect to the class Web server using FileZilla.
Solution: Make sure that you have input the host, username, and password correctly. Filezilla reports whether a connection is successful or not. Note that when you use the Internet at your work place, the firewall setting of your computer may prevent you from using any FTP programs (including Filezilla). Try to change the firewall setting if this happens.
Problem: unable to drag my file from my local computer to the class Web server.
Solution: Make sure that a successful connection is established (see (1)). Without a successful connection, you cannot drag a file from your local computer to the server. If you are sure that a successful connnection is already established, but the problem remains, or FileZilla reports "open for write: permission denied," it is very likely the server is busy, so please try it again a couple of minutes later. If the problem remains consistently, please email me, and I will take a look at your account.
Problem: frustrated or panic.
Solution: if you can make screenshots of your steps (at least the final step), send them to me, I will troubleshoot for you; or come to see me, the TA, or your classmates.
Problem: unable to view an HTML file using a browser.
Solution: Make sure you have used a correct URL. Check your folder name (which is your first initial followed by your last name, all in lower case) and your filename. Our server is a linux machine, so your folder name and filenames are all case-sensitive.
Problem: HTML tags are shown on my Web page.
Solution: Make sure you have saved your file as .html (rather than .html.txt), and make sure you have closed the tags that are supposed to be closed.
Problem: I cannot find Notepad on my computer.
Solution: If you use a Windows machine, Notepad is under "All Programs" --> "Accessories." You can also download Notepad++ from the Web. If you use a Mac machine, try to find TextEdit or download TextWrangler.
Problem: When I use TextEdit on my MAC and put the tags in (following the steps in the Huddleston text), they always show up when I bring up the webpage in my browser.
Solution: There can be multiple reasons, try the following: (1) turn on TextEdit's "plain text" mode; (2) close the tags that are supposed to be closed, such as title, head; (3) save the file as .html (or .htm) rather than .txt.
If you find TextEdit difficult to use, you may download TextWrangler. Previous students recommended TextWrangler.

Grading rubric:

+10: Task 1 (provenance detection)
+20: Task 2 (URL trimming)
+20: Task 3 (domain name detection)
+20: Task 4 (who owns .tv?)
+20: Task 5 (packets tracking)
+10: Discussion of your findings
-5: submitted webpage is not marked up as HTML or not rendered correctly by major browsers (such as Chrome, Safari, IE, Firefox)

Acknowledgment to Doug Oard, revised by Yejun Wu.

LIS 7008 - Information Technologies and Systems Spring 2017 - Section 01 Assignment 2

LIS 7008 - Information Technologies and Systems
Spring 2017 - Section 01
Assignment 2