Wednesday, December 3, 2008

Educational TV can be educational

I am of the belief that just because something says that it is healthy does not mean it is. For example, "low fat" to me means "less bad" rather than actually being "good". I originally was of the same opinion on educational TV. TV is brain rot for children (which is sometimes worth the quiet it brings). However, there have been two cartoons on PBS that have changed my view: Super Why and Sid the Science Kid.

Our daughter rarely seemed to be that interested in learning letters from mother and father, but once she started to watch Super Why she really started to pick up on all of the letter understanding the show brought. She had the alphabet down after just a couple weeks of watching the show once in a day. Entertaining education really worked for her.

Sid the Science Kid is the latest show that actually teaches our daughter something. She has learned about washing your hands to remove germs, what "melting" means, and what seeds are good for. I am a real fan of this show.

Wednesday, November 19, 2008

Google's Ranking Algorithm In Review

Google started on the basis of a ranking algorithm called PageRank (discussed in previous posts here and here). Of course there is so much more to the secret sauce for these search engines now. We just don't know what they are using.

Anyway, there was a recent paper published that collected traffic going into and out of the servers at Indiana U. Using this traffic they were able to disprove 3 major assumptions underlying PageRank. PageRank assumes
  • a user is equally likely to follow any link on a page.
Actually, links are very unevenly followed. Some links carry huge amounts of traffic and others rarely see a click. (Think of how you browse a web page. Aren't there links that never look interesting like "Report A Bug" on espn.go.com?)
  • the probability of "teleporting" (or going directly) to any web page is equal to any other web page.
Actually the chance of starting to surf from any page is very skewed. Some pages are very popular destinations without following links. How many of us have favorite sites that we visit through bookmarks or typing the URL every day. We do not randomly type in URLs.
  • the probability of "teleporting" from any web page is equal across all web pages.
This was more difficult to disprove from their data. However, some sites are more likely to be stopping points in browsing and others are a bridge to more information.

The bottom line is that the links of the web are not that good at determining what actual paths people follow while browsing. However, this is the basis of major search engines that link structure determines popularity. The redeeming quality of search engines from this paper though is that they lead people to less popular sites, or sites we would not otherwise find out about and thus spread the wealth of clicks around (which is in conflict with what I had previously said in my first post on Google bias).

Thursday, November 13, 2008

The Machine is Us/ing Us

Insightful video about Web 2.0 and how you fit in to the current model of information sharing. This video was published by Michael Wesch, an assistant professor in Anthropology at Kansas State University.

Monday, November 3, 2008

Google Bias Take 2

I earlier posted that Google's ranking of search results caused a rich-get-richer problem. In other words sites linked to most often will be ranked first leading to more links.

Here is a paper that uses traffic information from Alexa to disprove this theory. It turns out that queries on search engines are very diverse. This leads to sites appearing towards the top that more specifically target the keywords given. For example Google's Udi Manber said "20 to 25% of the queries we see today, we have never seen before".

Current traffic from Alexa more closely follows the random surfer model, or discovering of web pages by viewing non-search web pages and clicking on links. It is good to see that worrisome theories are being put to the test.

Wednesday, October 29, 2008

Pandora.com

For a time I had no hope that recommender systems like Amazon.com's "Recommended for You" section would be useful to me specifically. The predictions were often predictable. Buy a CD from artist A and get a list of the most popular CD's from that artist. Not useful.

Some time ago I came across Pandora.com, which is an adapting radio station, which chooses songs to play based on what songs you have added to a station and what songs you rate positively. I actually learned of several songs and artists I was unfamiliar with that I now like (such as "Question Everything" by 8Stops7). However, it does not play all songs that are similar to the songs I tell it. And some days I find myself disagreeing with all songs played.

I think that as time goes on recommender systems will improve and we will give some credibility to recommenders. Perhaps the Netflix prize will help in that regard.

Netflix Recommender System

Netflix is trying to motivate research in the area of recommender systems and on Oct. 2, 2006 offered $1 million to anyone that could improve upon their current recommender system by a specific measure (improve RMSE by 10%). Recently I took a look at the current standings and one team is very close (improvement around 9%). Interestingly enough they had a few papers showing how they do it.

Specifically what we are talking about is collaborative filtering. There are two main approaches, either you look for global patterns in the matrix of ratings or you use the ratings from similar items or users. BellKor (team name) was able to successfully merge these two ideas into a single solution that outperformed (at the time of submission) any other approaches using one of the two approaches.

What impressed me most about the paper I read (Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model) was that in addition to testing RMSE for the test, they tried to look at the users perspective. We want to know what movie to watch now. They compared other approaches against theirs on whether they would recommend in the top 5 or top 20 a movie you would watch and rate a 5. Well done. We should all keep the end user in mind.

Any one have a really good or bad experience with recommendations made by computers?

Tuesday, September 30, 2008

Do Good Grades Predict Success? (Freakonomics blog entry)

I recently read the post in the title of this blog entry at the Freakonomics blog, which I frequent. I love the question and have wondered myself some of the of the following related questions:
  • Do grades measure our understanding or ability to learn?
  • How fair is it to compare grades of different students from different schools, classes, teachers? (Some teachers are "easy" and some "hard".)
My biggest question though is: how much does school prepare us for what is to come? High school to college can be a difficult jump, but I found that being one of the top students by grade, timely completion of assignments, and understanding (in my estimation of course) did not prepare me for:
  • Looking for a job.
  • Interviewing well.
  • Being a programmer in the real-world.
I should not expect class work to prepare me for looking for jobs and interviewing, but I would have hoped that my view of life after school would have been clearer than it was. Perhaps the onus is on the student, but I think teachers can do a better a job of preparing students for careers rather than being good test takers.