tag:blogger.com,1999:blog-21453947815776833632023-11-16T07:25:35.807-05:00Student WThe thoughts of a Ph.D. student in Computer Science.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.comBlogger36125tag:blogger.com,1999:blog-2145394781577683363.post-2572567292586494222010-09-28T13:20:00.000-04:002010-09-28T13:20:59.213-04:00LG 420G No SignalI just recently bought a LG 420G at Target. After registering the serial number with TracFone.com, it seemed I could get no signal either at home or at work. I filed a complaint and the simple answer was to power off and turn back on. I did this and now the phone (fone ;) ) works.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-56215948079447556232010-04-28T17:09:00.004-04:002010-04-28T17:13:17.214-04:00PostgreSQL simple dblink cursor exampleIn PostgreSQL, dblink works fine to query the contents of a table remotely and insert into a new table. However, sometimes that table is too big to fit into memory. Here is an example function which uses cursors to do just that:<br /><br /><span style="font-family: arial;">CREATE OR REPLACE FUNCTION get_data() RETURNS void AS $$</span><br /><span style="font-family: arial;">BEGIN</span><br /><span style="font-family: arial;"> PERFORM dblink_connect('dbname=db hostaddr=host port=5432 </span><br /><span style="font-family: arial;"> user=user password=password');</span><br /><span style="font-family: arial;"> PERFORM dblink_open('curs', 'select * from table');</span><br /><span style="font-family: arial;"> LOOP</span><br /><span style="font-family: arial;"> INSERT INTO table</span><br /><span style="font-family: arial;"> SELECT data.*</span><br /><span style="font-family: arial;"> FROM dblink_fetch('curs', 1)</span><br /><span style="font-family: arial;"> AS data(<column>);</span><br /><span style="font-family: arial;"> IF NOT FOUND THEN</span><br /><span style="font-family: arial;"> EXIT;</span><br /><span style="font-family: arial;"> END IF;</span><br /><span style="font-family: arial;"> END LOOP;</span><br /><span style="font-family: arial;"> PERFORM dblink_close('curs');</span><br /><span style="font-family: arial;"> PERFORM dblink_disconnect();</span><br /><span style="font-family: arial;">END;</span><br /><span style="font-family: arial;">$$ LANGUAGE plpgsql;</span>Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com1tag:blogger.com,1999:blog-2145394781577683363.post-28973701977857378432010-04-27T16:21:00.003-04:002010-04-27T16:38:20.233-04:00PostgreSQL Arrays and Java JDBC (always quote on insert)I ran into a problem recently using PostgreSQL arrays with Java. I had a table where each row represented a sentence in a web page. One column contained the entire sentence. Another column contained an array of tokens that make up that sentence (as parsed by GATE).<br /><br />While parsing a Wikipedia page, it contained the character: '†' (\u2020). This inserted fine as a sentence, but became {"?"} in the array. I used the class <code class="java plain"><a href="http://valgogtech.blogspot.com/2009/02/passing-arrays-to-postgresql-database.html">PostgreSQLTextArray</a> </code>from Valentine's tech log (thanks Google). The contents would not display in pgAdmin3. How in the world could the same character work to insert into a character varying field but not a character varying[] field?!<br /><br />What I found out is that this character and many others need to be quoted to properly insert into a PostgreSQL array. I changed the Array class I was using to always quote characters.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-51194772355684829312009-12-29T15:14:00.002-05:002009-12-29T15:20:32.281-05:00Fix quiet microphone on Acer Aspire 5100 after install Windows 7After installing Windows 7 on my system the microphone was extremely quiet with the default Microsoft drivers. I finally was able to fix the problem using the following steps:<br /><ol><li>Download driver from Acer.com (<a href="http://global-download.acer.com/GDFiles/Driver/Audio/Audio_Realtek_10.0.5605_Vistax86.zip?acerid=633640464374493351&Step1=Notebook&Step2=Aspire&Step3=Aspire%205100&OS=V10&LC=en&BC=Acer&SC=PA_6">http://global-download.acer.com/GDFiles/Driver/Audio/Audio_Realtek_10.0.5605_Vistax86.zip?acerid=633640464374493351&Step1=Notebook&Step2=Aspire&Step3=Aspire%205100&OS=V10&LC=en&BC=Acer&SC=PA_6</a>).</li><li>Next disable Windows automatic driver download and installation. Type "change device installation settings" into Start and select the option of the same name. Change the settings to either "No..." and "Never..." or "No..." and "Install...if not found...". (This is the critical step.)</li><li>I also changed the driver install to work in compatibility mode. This is done by right clicking on the Setup.exe, selecting properties > compatibility and changing to Vista mode.</li><li>Install the drivers and restart your computer. If you are as lucky as me then your built in microphone will now work.<br /></li></ol>Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com3tag:blogger.com,1999:blog-2145394781577683363.post-34574122949370670792009-08-10T14:25:00.002-04:002009-08-10T14:28:38.656-04:00Use of Grocery Store DataI am sure all of you have seen or used a grocery store card to receive discounts on purchases. These cards allow stores to link together purchases made over time to a single customer. I know there are many potential uses to this kind of data. On Friday I finally saw a use to this data that I agree with. Kroger sent my family in the mail personalized coupons. These coupons were for in all but one case items that we have bought in the past. We plan on actually using these coupons. I am not sure what the incentives are for Kroger, but I will be glad to use coupons for things my family was planning on purchasing anyways.<br /><br />Well done Kroger.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com3tag:blogger.com,1999:blog-2145394781577683363.post-74019554504721101292009-06-12T10:20:00.002-04:002009-06-12T10:22:48.374-04:00Health Care SpendingCheck out <a href="http://gregmankiw.blogspot.com/2009/06/is-increased-health-spending-optimal.html">this</a> blog entry. It gave me new eyes to see health care spending as not a burden, but potentially a boon to society. With increased health care spending we are spending money on prolonging our lives.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-2411620221886854792009-06-12T10:17:00.003-04:002009-06-12T10:19:56.015-04:00Google WaveI watched most of the long video previewing Google Wave, and I have to say that I am impressed. For starters I really like the feature that when doing IM you can watch the other person type. I find myself staring at times at the message "so and so is typing...". I look forward to the release of Google Wave. Check it out for yourself (<a href="http://wave.google.com/">http://wave.google.com/</a>).Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-73950035022266068792009-02-23T19:35:00.002-05:002009-02-23T19:38:07.983-05:00Gold in one unlikely placePeople tell us there are gold nuggets hidden in data talking of data mining. People occasionally stumble upon inheritances they never knew of containing gold (especially relatives from African royalty). But who would have thought there was gold to be found in sewage: <a href="http://www.cnn.com/video/#/video/tech/2009/02/22/lah.japan.gold.poop.cnn">http://www.cnn.com/video/#/video/tech/2009/02/22/lah.japan.gold.poop.cnn</a>.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-82296143407148956582009-02-04T10:20:00.001-05:002009-02-04T10:22:12.224-05:00The Numbers on Consumer Spending from Mint.com<a href="http://www.techcrunch.com/2009/01/30/the-economy-according-to-mint/">Here</a> is a look according to the anonymous users of Mint.com that report on spending, savings accounts and more. Check it out.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-67217434866763008942009-02-04T10:09:00.001-05:002009-02-04T10:11:14.840-05:00Creative HackersCheck out this road sign from Texas (<a href="http://www.foxnews.com/story/0,2933,484326,00.html">"Zombies Ahead"</a>).Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-72114840864359117092009-01-30T09:24:00.002-05:002009-01-30T09:36:48.405-05:00Economists clueless on macreconimic issues?I recently read a post <a href="http://www.willwilkinson.net/flybottle/2009/01/24/are-economists-completely-clueless/">here</a> that stated basically that economists start talking about the psychology of the people but don't understand psychology and that at the macro level economists are pretty much clueless. Take a look at the article. It is a good read as well as many of the comments.<br /><br />One of the major problems with economics is that it is really difficult to test macro level theories. How do you run a randomized experiment on some theory at the national or global level? You don't. No politician will let you use his/her people as guinea pigs. We don't want to be guinea pigs. Yet we have learned that good data is much more reliable than intuition and gut feel (see these <a href="http://exp-platform.com/Documents/2008-10-29%20ExP%20CIKM.pptx">slides</a> and this <a href="http://www.amazon.com/Super-Crunchers-Thinking-Numbers-Smart/dp/0553805401">book</a>).<br /><br />The only way to understand macroeconomics reliably is with experimentation (like all other good sciences) and that is near impossible at the macro level.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-10097818127031228702009-01-07T10:42:00.002-05:002009-01-07T10:45:45.752-05:00Virtual CloneDriveIf anyone has to install some software from ISO files they downloaded may I recommend <a href="http://www.slysoft.com/en/virtual-clonedrive.html">Virtual CloneDrive</a>. I had to install SAS to my computer for a class which required at least 11 ISO files to be burned to disk. Luckily with the Virtual CloneDrive I could install from my hard drive without burning one disk and the CD images were read faster than if they were in a real CD drive. Smooth and painless!Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-62114544373755088392008-12-03T20:18:00.002-05:002008-12-03T20:28:03.956-05:00Educational TV can be educationalI am of the belief that just because something says that it is healthy does not mean it is. For example, "low fat" to me means "less bad" rather than actually being "good". I originally was of the same opinion on educational TV. TV is brain rot for children (which is sometimes worth the quiet it brings). However, there have been two cartoons on PBS that have changed my view: <a href="http://pbskids.org/superwhy/">Super Why</a> and <a href="http://pbskids.org/sid/#/playground">Sid the Science Kid</a>.<br /><br />Our daughter rarely seemed to be that interested in learning letters from mother and father, but once she started to watch Super Why she really started to pick up on all of the letter understanding the show brought. She had the alphabet down after just a couple weeks of watching the show once in a day. Entertaining education really worked for her.<br /><br />Sid the Science Kid is the latest show that actually teaches our daughter something. She has learned about washing your hands to remove germs, what "melting" means, and what seeds are good for. I am a real fan of this show.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com2tag:blogger.com,1999:blog-2145394781577683363.post-36418111469694390592008-11-19T14:08:00.004-05:002008-11-24T15:53:02.560-05:00Google's Ranking Algorithm In ReviewGoogle started on the basis of a ranking algorithm called PageRank (discussed in previous posts <a href="http://studentw.blogspot.com/2008/11/google-bias-take-2.html">here</a> and <a href="http://studentw.blogspot.com/2008/09/google-bias.html">here</a>). Of course there is so much more to the secret sauce for these search engines now. We just don't know what they are using.<br /><br />Anyway, there was a recent <a href="http://www.informatics.indiana.edu/fil/Papers/click.pdf">paper</a> published that collected traffic going into and out of the servers at Indiana U. Using this traffic they were able to disprove 3 major assumptions underlying PageRank. PageRank assumes<br /><ul><li>a user is equally likely to follow any link on a page. </li></ul>Actually, links are very unevenly followed. Some links carry huge amounts of traffic and others rarely see a click. (Think of how you browse a web page. Aren't there links that never look interesting like "Report A Bug" on espn.go.com?)<br /><ul><li>the probability of "teleporting" (or going directly) to any web page is equal to any other web page.</li></ul>Actually the chance of starting to surf from any page is very skewed. Some pages are very popular destinations without following links. How many of us have favorite sites that we visit through bookmarks or typing the URL every day. We do not randomly type in URLs.<br /><ul><li>the probability of "teleporting" from any web page is equal across all web pages.</li></ul>This was more difficult to disprove from their data. However, some sites are more likely to be stopping points in browsing and others are a bridge to more information.<br /><br />The bottom line is that the links of the web are not that good at determining what actual paths people follow while browsing. However, this is the basis of major search engines that link structure determines popularity. The redeeming quality of search engines from this paper though is that they lead people to less popular sites, or sites we would not otherwise find out about and thus spread the wealth of clicks around (which is in conflict with what I had previously said in my first post on <a href="http://studentw.blogspot.com/2008/09/google-bias.html">Google bias</a>).Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-80185328808055130122008-11-13T13:30:00.005-05:002008-11-19T16:11:49.745-05:00The Machine is Us/ing UsInsightful video about Web 2.0 and how you fit in to the current model of information sharing. This video was published by Michael Wesch, an assistant professor in Anthropology at Kansas State University.<br /><a style="left: 0px ! important; top: 15px ! important;" title="Click here to block this object with Adblock Plus" class="abp-objtab-06904216498630698 visible" href="http://www.youtube.com/v/NLlGopyXT_g&hl=en&fs=1"></a><a style="left: 0px ! important; top: 0px ! important;" title="Click here to block this object with Adblock Plus" class="abp-objtab-06788153768203918 visible" href="http://www.youtube.com/v/NLlGopyXT_g&hl=en&fs=1"></a><object height="344" width="425"><param name="movie" value="http://www.youtube.com/v/NLlGopyXT_g&hl=en&fs=1"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed src="http://www.youtube.com/v/NLlGopyXT_g&hl=en&fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="344" width="425"></embed></object>Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-56541281716747117792008-11-03T13:35:00.002-05:002008-11-03T13:51:35.882-05:00Google Bias Take 2I earlier posted that Google's ranking of search results caused a rich-get-richer problem. In other words sites linked to most often will be ranked first leading to more links.<br /><br /><a href="http://www.pnas.org/content/103/34/12684.abstract">Here</a> is a paper that uses traffic information from Alexa to disprove this theory. It turns out that queries on search engines are very diverse. This leads to sites appearing towards the top that more specifically target the keywords given. For example Google's Udi Manber said "<a href="http://www.readwriteweb.com/archives/udi_manber_search_is_a_hard_problem.php">20 to 25% of the queries we see today, we have never seen before</a>".<br /><br />Current traffic from Alexa more closely follows the random surfer model, or discovering of web pages by viewing non-search web pages and clicking on links. It is good to see that worrisome theories are being put to the test.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-47342380184541404702008-10-29T16:33:00.002-04:002008-10-29T16:34:31.195-04:00Pandora.comFor a time I had no hope that recommender systems like Amazon.com's "Recommended for You" section would be useful to me specifically. The predictions were often predictable. Buy a CD from artist A and get a list of the most popular CD's from that artist. Not useful.<br /><br />Some time ago I came across <a href="http://pandora.com/">Pandora.com</a>, which is an adapting radio station, which chooses songs to play based on what songs you have added to a station and what songs you rate positively. I actually learned of several songs and artists I was unfamiliar with that I now like (such as "<a href="http://www.pandora.com/music/song/8stops7/question+everything">Question Everything</a>" by 8Stops7). However, it does not play all songs that are similar to the songs I tell it. And some days I find myself disagreeing with all songs played.<br /><br />I think that as time goes on recommender systems will improve and we will give some credibility to recommenders. Perhaps the Netflix prize will help in that regard.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-59635018630909387062008-10-29T16:10:00.004-04:002008-10-29T16:34:39.314-04:00Netflix Recommender SystemNetflix is trying to motivate research in the area of recommender systems and on Oct. 2, 2006 offered <a href="http://www.netflixprize.com//index">$1 million</a> to anyone that could improve upon their current recommender system by a specific measure (improve RMSE by 10%). Recently I took a look at the current standings and one team is very close (improvement around 9%). Interestingly enough they had a <a href="http://www.research.att.com/%7Evolinsky/netflix/">few papers</a> showing how they do it.<br /><br />Specifically what we are talking about is collaborative filtering. There are two main approaches, either you look for global patterns in the matrix of ratings or you use the ratings from similar items or users. BellKor (team name) was able to successfully merge these two ideas into a single solution that outperformed (at the time of submission) any other approaches using one of the two approaches.<br /><br />What impressed me most about the paper I read (<a href="http://public.research.att.com/%7Evolinsky/netflix/kdd08koren.pdf">Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Mode</a>l) was that in addition to testing RMSE for the test, they tried to look at the users perspective. We want to know what movie to watch now. They compared other approaches against theirs on whether they would recommend in the top 5 or top 20 a movie you would watch and rate a 5. Well done. We should all keep the end user in mind.<br /><br />Any one have a really good or bad experience with recommendations made by computers?Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-24688539720112534592008-09-30T09:30:00.002-04:002008-09-30T09:42:20.394-04:00Do Good Grades Predict Success? (Freakonomics blog entry)I recently read <a href="http://freakonomics.blogs.nytimes.com/2008/09/29/do-good-grades-predict-success/">the post</a> in the title of this blog entry at the Freakonomics blog, which I frequent. I love the question and have wondered myself some of the of the following related questions:<br /><ul><li>Do grades measure our understanding or ability to learn?</li><li>How fair is it to compare grades of different students from different schools, classes, teachers? (Some teachers are "easy" and some "hard".)<br /></li></ul>My biggest question though is: how much does school prepare us for what is to come? High school to college can be a difficult jump, but I found that being one of the top students by grade, timely completion of assignments, and understanding (in my estimation of course) did not prepare me for:<br /><ul><li>Looking for a job.</li><li>Interviewing well.</li><li>Being a programmer in the real-world.</li></ul>I should not expect class work to prepare me for looking for jobs and interviewing, but I would have hoped that my view of life after school would have been clearer than it was. Perhaps the onus is on the student, but I think teachers can do a better a job of preparing students for careers rather than being good test takers.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com2tag:blogger.com,1999:blog-2145394781577683363.post-43483213689737444872008-09-10T16:12:00.002-04:002008-09-10T17:06:27.303-04:00Google BiasIn response to my post Google Wishlist, one reader wrote:<br />"I would love the Google search results to include more web addresses with a differing suffix than .com (i.e. .net, .org). There are a ton of sites that get overlooked because of the seemed bias that Google has for the .com sites.<br /><br />"I will admit that they have done a little better on including some of these sites on more popular searches, but as a whole the .com sites seem to get the preference."<br /><br />I would agree with this reader that certainly Google results are biased, as they have to give preference in some reasonable way. Now I do not believe that the suffix of a given website is used to rank the website (try a search for "<a href="http://www.google.com/search?hl=en&q=plutonium&btnG=Google+Search&aq=f&oq=">Plutonium</a>" for example and the top result as I see it is from Wikipedia.org). I do believe that the results are biased by "link popularity" or by <a href="http://www.iicm.tugraz.at/thesis/cguetl_diss/literatur/Kapitel07/References/Brin_et_al._1998/TheAnatomyofGoogle.html">PageRank</a> (as explained by the founders of Google, Sergey Brin and Lawrence Page). Basically, as I understand it the basis for ranking in many search engines is based on how many links (and the "quality" of these links) that link to a domain or webpage.<br /><br />The decision to go with PageRank was a good choice. It put Google on the map originally. However, there are some drawbacks for people like me. The bias towards more popular pages, means that it is more difficult to climb to the top. It is a rich get richer web world. Those that have links, are more easily found, meaning they more easily are linked to. This would explain why one would more likely see links ranked first from big name .com sites. Now if I wrote the most informative page on Plutonium around, it would likely never beat out the Wikipedia page (everyone is linking to Wikipedia these days). For more on this topic check out the article <a href="http://cis.poly.edu/%7Eqq_gan/papers/1p20.pdf">Impact Of Search Engines On Page Popularity</a>.<br /><br />In conclusion, Google is biased necessarily which is fine for them but bad for the little guys.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com2tag:blogger.com,1999:blog-2145394781577683363.post-36727766144588973362008-09-05T11:47:00.004-04:002008-09-05T11:58:20.003-04:00Cool Search Engine InterfacesFor those who enjoy trying something new check out the following search engines:<div><br /></div><div><a href="http://kartoo.com/">Kartoo</a>: clusters pages within a search displaying visually.</div><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGv5FyS5X7odftkuPCIJfNcQ3ROVxr55cyZwsUOCBZ0xujKRrWqhndRBd5UDeeftBO0rPWvoCgTmvx2s5O66gGf3x9rBtBWKkx-AYeaFdw7Ih1qfLrTl0a1ndcQV__rvzpkvKZ0sk_7Zf_/s200/kartoo.jpg" style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;" border="0" alt="" id="BLOGGER_PHOTO_ID_5242566337766428834" /><div><br /><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><a href="http://searchme.com/">Searchme</a>: shows page with highlighted keywords.</div><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6H3aok5TuZDtbWGY8sQnaYp23mxenefW4P4pIm0SRim7JbsTXAg5Yr9p2BxEZ1CnslBb1xIhYDo5RJXjLQ7iIsYcoTXX6b72C8ekiD9noeXBf5cAl5tgzMdSEXZC4CWG7ke2oTpIPF7rV/s200/searchme.jpg" style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;" border="0" alt="" id="BLOGGER_PHOTO_ID_5242566649695518594" /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div>If you really get an itching to check out other search engines see the article <a href="http://www.readwriteweb.com/archives/top_100_alternative_search_engines_mar07.php">Top 100 Alternative Search Engines, March 2007</a>.</div>Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-39853720710099130942008-08-31T15:17:00.004-04:002008-09-01T14:11:46.850-04:00Biology + Computer Science = Proof of God ?I am a Computer Scientist, or so I claim. I am working on my PhD in Computer Science and recently took a class on Bioinformatics. Here I learned about the mysterious world of DNA. Certainly I am no expert in the field, but what a complex system we have found in finding DNA. Most living things have DNA (or RNA in some form or another) which is changing through time and encodes what we are made of. The DNA is translated through multiple steps into proteins. These proteins bend around into varying shapes, which shapes are not constant through time and are part of metabolic pathways. This is about where my knowledge ends of the whole process of DNA up to what the functions of a cell are.<br /><br />I do not know how accurate the comparison is, but I see DNA in a similar way to computer languages, like Perl or Java. Computer languages are a set of very simple instructions that a human understands, which are compiled into binary which tell the CPU how to manipulate variouos memory locations in a computer. Computer languages are built on top of the mechanics of a computer system. The entire process of designing a computer, building a computer, designing a computer language, specifying all of what a language understands, building a compiler for the language, and finally building a computer program from that language which works properly is a very organized process. Chaos does not create the final process, but it is endless hours of planning, organizing logically and testing to be sure that what one has created has no major, show stopping flaws. In the end all you or I may see is something like the text on this screen transmitted perhaps 100 of miles through wires to be displayed on your computer screen.<br /><br />As I see it DNA has a set of rules that it follows (though scientists have not discovered all of them) and is part of a large, complex process that makes the encoding of life possible to be passed on to the next generation. This entire process as well as all other components of a living organism, such as reproduction, eating mechanisms, having food available, and aging are all necessary parts of even the most simple, single cell living organisms. I do not understand how the world could make the leap from no life on earth to a single living cell that was capable of encoding its being and reproducing without an creator involved in the process. In my limited experience, chance does not bring about large scale organization.<br /><br />I believe there is a God and that the complexities of biology, for example, show that God was involved in the creation of life as we know it.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com4tag:blogger.com,1999:blog-2145394781577683363.post-71216202483332813032008-08-26T09:14:00.000-04:002008-08-26T12:50:26.768-04:00Google WishlistWhat features do you wish Google Search had but doesn't?<br /><br />Yesterday marks the start of my second year as a PhD student. I have been working on click fraud research, but soon I will be switching to <a href="http://en.wikipedia.org/wiki/Text_mining">Text Mining</a>, but I need some ideas as to what I should do. To get the creative juices flowing, we will start by listing features that you wish Google Search had. Any type of feature, large or small, no matter how realistic it is. Here are some that I have come across that sound interesting:<br /><ul><li>Summarize Opinions: What if you could search "people's opinions on restaraunt X" and the response would be summary of all opinions on all websites to do with that restaraunt say 35% liked it with links to those who did and 65% did not like it, then summary of why they did or did not like it. (For an example see Live's search summary of opinions limited to a few products "<a href="http://search.live.com/products/?q=canon%20rebel&p1=%5BCommerceService+scenario%3d%22reviews%22+docid%3d%22E853BAFD866346A6B8A4%22+p%3d%227b17f8736d7843598d3d41509ba27679%22%5D&wf=Commerce&FORM=ENCA">Canon EOS Digital Rebel XT - digital camera, 8MP, 3x Optical Zoom</a>")<br /></li><li>Search Topics not always Keywords: I search for papers on text mining on Google Scholar and find only a handful of papers that relate to what I am looking for. Most of the papers I would be interested in do not actually say "Text Mining" in the title, and sometimes not even in the paper but that is what they are talking about. I want the category "Text Mining" not the keywords "Text Mining".</li><li>Answer My Question: Sometimes I will search using a whole question, not just keywords and find someone else that asked the question on a forum, but did not get a response. I want to find the answer, not the question. This is not a new thought. There are already several approaches attempting to do this, but none of them are in the 90% accuracy on open ended questioning. (Examples: <a href="http://www.ask.com/?o=0&l=dir">Ask Jeeves</a>, <a href="http://start.csail.mit.edu/">START</a> from MIT, etc.)</li></ul>Please leave your thoughts. Anything welcome.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com6tag:blogger.com,1999:blog-2145394781577683363.post-26039709238210711322008-08-18T15:08:00.000-04:002008-08-19T11:57:42.176-04:00WordNet 3.0 Summary StatisticsI recently downloaded <a href="http://wordnet.princeton.edu/">WordNet</a> 2.1 (an ontology for the English language) for Windows and was impressed by the results of just basic words, that there were several senses displayed with definitions and example sentences. I thought I would run a few queries on WordNet. Fortunately, in the "Related Projects" link on WordNet someone has already loaded WordNet 3.0 into PostGreSQL (http://sourceforge.net/project/showfiles.php?group_id=135112&package_id=219735/&abmode=1).<br /><br />Here are some of my basic questions:<ol><li><span style="font-weight: bold;">How many words are there in WordNet 3.0?</span><br />147,306<br />According, to one source (I am sure there is plenty of debate about how many words are in the English language in common usage), "The Second Edition of the <i>Oxford English Dictionary</i> contains full entries for 171,476 words in current use, and 47,156 obsolete words." (from <a href="http://www.askoxford.com/asktheexperts/faq/aboutenglish/numberwords">AskOxford</a>) Of course there were several other answers from other websites, but if the same definitions of what a word is are in use, then WordNet is doing pretty well.</li><li><span style="font-weight: bold;">What are examples of words found in WordNet to get a sense of what the dictionary contains?</span><br />52850;"gas system"<br />103824;"predicate"<br />27754;"committal to memory"<br />81032;"malapropism"<br />18769;"butt hinge"<br />97732;"pectinidae"<br />46875;"family termitidae"<br />Here we see that by word it is not meant a single token. Also, the numbers next to each word were the IDs found in the database.<br /></li><li><span style="font-weight: bold;">How many senses does a word have?</span><br />mean: 1.4, median: 1<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqcbH73rNFiqGE5fPeNASSDPGTR-4EcOPu8Q85kWOnvSjQaWiT5QXqXIPgieZRYqL2IEUVqCO2P3CE-b21l0Dqru5f8z1umKt5E_LfcDYg0t4RJgJAEt39dgAfVifgZgVtyI1AeuPanMH_/s1600-h/senses.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqcbH73rNFiqGE5fPeNASSDPGTR-4EcOPu8Q85kWOnvSjQaWiT5QXqXIPgieZRYqL2IEUVqCO2P3CE-b21l0Dqru5f8z1umKt5E_LfcDYg0t4RJgJAEt39dgAfVifgZgVtyI1AeuPanMH_/s200/senses.png" alt="" id="BLOGGER_PHOTO_ID_5236239131943348674" border="0" /></a><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVqXCAX5YPLdFrEvXwSnZkH8YnnRv5W_KnSZuKDEINcQ59bsXW5RbbcEmbg7QRrwBhhT9Fl9ddgx16G8XyGXBKEvZbInKhXTCxtMNscXnvUIXSuqytHnMI5U5ASk-eelnDiR_Odj80NDXN/s1600-h/senses.png"><br /></a> <table style="border-collapse: collapse; width: 144pt;" border="0" cellpadding="0" cellspacing="0" width="192"><col style="width: 48pt;" span="2" width="64"> <col style="width: 48pt;" width="64"> <tbody><tr style="height: 15pt;" height="20"> <td style="height: 15pt; width: 48pt;" align="right" height="20" width="64">1</td> <td style="width: 48pt;" align="right" width="64">120433</td> <td class="xl65" style="width: 48pt;" align="right" width="64">82%</td><td style="vertical-align: top;"><br /></td><td style="vertical-align: top;"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">2</td> <td align="right">15711</td> <td class="xl65" align="right">11%</td><td style="vertical-align: top;"><br /></td><td style="vertical-align: top;"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">3</td> <td align="right">5116</td> <td class="xl65" align="right">3%</td><td style="vertical-align: top;"><br /></td><td style="vertical-align: top;"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">57</td> <td align="right">1</td> <td class="xl65" align="right">0%</td><td style="vertical-align: top;">"run"<br /></td><td style="vertical-align: top;"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">70</td> <td align="right">1</td> <td class="xl65" align="right">0%</td><td style="vertical-align: top;">"cut"<br /></td><td style="vertical-align: top;"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">75</td> <td align="right">1</td> <td class="xl65" align="right">0%</td><td style="vertical-align: top;">"break"<br /></td><td style="vertical-align: top;"><br /></td> </tr> </tbody></table></li><li><span style="font-weight: bold;">How many </span><a style="font-weight: bold;" href="http://en.wikipedia.org/wiki/Hypernym">hypernyms</a><span style="font-weight: bold;"> (broader words) does a word have?</span><br />mean: 2.17, median: 1<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiu9ICKb6wn9aG_EoJUJNpbJrThxGpxTlbViBtffwl_-BS-mEthHrNtKgGDXIOUZVJNLpqQHkXUl4Fw62FV27r5dBPDr2f6jikDJ-1Vjzepf1xOl5J2-oJCBtpOOdl62sa5Bdvrcg8h-GL8/s1600-h/hypernym.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiu9ICKb6wn9aG_EoJUJNpbJrThxGpxTlbViBtffwl_-BS-mEthHrNtKgGDXIOUZVJNLpqQHkXUl4Fw62FV27r5dBPDr2f6jikDJ-1Vjzepf1xOl5J2-oJCBtpOOdl62sa5Bdvrcg8h-GL8/s200/hypernym.png" alt="" id="BLOGGER_PHOTO_ID_5236241014180307922" border="0" /></a> <table style="border-collapse: collapse; width: 298px; height: 156px;" border="0" cellpadding="0" cellspacing="0"><col style="width: 48pt;" span="2" width="64"> <col style="width: 48pt;" width="64"> <tbody><tr><td style="vertical-align: top; text-align: center;">hypernyms<br /></td><td style="vertical-align: top; text-align: center;">words<br /></td><td style="vertical-align: top; text-align: center;">percent<br /></td><td style="vertical-align: top;"><br /></td></tr><tr style="height: 15pt;" height="20"> <td style="height: 15pt; width: 48pt;" align="right" height="20" width="64">0</td> <td style="width: 48pt;" align="right" width="64">35649</td> <td class="xl65" style="width: 48pt;" align="right" width="64">24%</td><td style="vertical-align: top;"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">1</td> <td align="right">46213</td> <td class="xl65" align="right">31%</td><td style="vertical-align: top;"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">2</td> <td align="right">27468</td> <td class="xl65" align="right">19%</td><td style="vertical-align: top;"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">113</td> <td align="right">1</td> <td class="xl65" align="right">0%</td><td style="vertical-align: top; text-align: right;">"hold"<br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">125</td> <td align="right">1</td> <td class="xl65" align="right">0%</td><td style="vertical-align: top; text-align: right;">"cut"<br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">156</td> <td align="right">1</td> <td class="xl65" align="right">0%</td><td style="vertical-align: top; text-align: right;">"break"<br /></td> </tr> </tbody></table><br /></li><br /><li><span style="font-weight: bold;">How many </span><a style="font-weight: bold;" href="http://en.wikipedia.org/wiki/Hyponym">hyponyms</a><span style="font-weight: bold;"> (specific examples) does a word have in WordNet?<br /></span>mean: 2.17, median: 0<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihRTkXfpjQ22R9lku8Xzm-zvGgLwYRu89KUGtwu2qxCC-FwnXWqQGNrs63tihNHyHJx1k8tM85SDVLmjmrvJn6cUNIUrxWO5WYMunEkzlJ77ldWopFy84bVFLp3wrj08LCSE5M_Ep3RWic/s1600-h/hyponym.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihRTkXfpjQ22R9lku8Xzm-zvGgLwYRu89KUGtwu2qxCC-FwnXWqQGNrs63tihNHyHJx1k8tM85SDVLmjmrvJn6cUNIUrxWO5WYMunEkzlJ77ldWopFy84bVFLp3wrj08LCSE5M_Ep3RWic/s200/hyponym.png" alt="" id="BLOGGER_PHOTO_ID_5236244302185623506" border="0" /></a><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQ7Z3ryWrY6J0Fbfl5OWUiCxRXk-AZYnBvYusFSmXP2HiZtntyg63C5WIwp8w42EVVU2S1FHjC7ZESmaci_SuLTLsWCTbT3-ZmIqcABiarhvD7tq9nMl6No8vTBknwVBCcsagcOA34rRe7/s1600-h/hyponym.png"> </a><table style="border-collapse: collapse; width: 374px; height: 136px;" border="0" cellpadding="0" cellspacing="0"><col style="width: 48pt;" span="2" width="64"> <col style="width: 48pt;" width="64"> <col style="width: 48pt;" width="64"> <tbody><tr><td style="vertical-align: top; text-align: center;">hyponyms<br /></td><td style="vertical-align: top; text-align: center;">words<br /></td><td style="vertical-align: top; text-align: center;">percent<br /></td><td style="vertical-align: top;"><br /></td></tr><tr style="height: 15pt;" height="20"> <td style="height: 15pt; width: 48pt;" align="right" height="20" width="64">0</td> <td style="width: 48pt;" align="right" width="64">118372</td> <td class="xl65" style="width: 48pt;" align="right" width="64">80%</td> <td style="width: 48pt;" width="64"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">1</td> <td align="right">5427</td> <td class="xl65" align="right">4%</td> <td><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">2</td> <td align="right">4250</td> <td class="xl65" align="right">3%</td> <td><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">897</td> <td align="right">1</td> <td class="xl65" align="right">0%</td> <td>"herbaceous plant"</td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">913</td> <td align="right">1</td> <td class="xl65" align="right">0%</td> <td>"herb"</td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">1007</td> <td align="right">1</td> <td class="xl65" align="right">0%</td> <td>"change"</td> </tr> </tbody></table><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQ7Z3ryWrY6J0Fbfl5OWUiCxRXk-AZYnBvYusFSmXP2HiZtntyg63C5WIwp8w42EVVU2S1FHjC7ZESmaci_SuLTLsWCTbT3-ZmIqcABiarhvD7tq9nMl6No8vTBknwVBCcsagcOA34rRe7/s1600-h/hyponym.png"><br /></a> </li><li><span style="font-weight: bold;">How many synonyms does a word have?</span><br />mean: 2.07, median: 1<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj68pGT-KqwJKEszFkSk46etgwqxUFvN35lSkjgb3a7HTqFXDWVmXpj3GJ34ixD8qjTsu1CT_H6sUvabZjzOeJBsFlwqucr1X2jy9evs-1-Ty06AS1ePnqfgG5aXRI6PullQIDkLWquUGK-/s1600-h/synonym.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj68pGT-KqwJKEszFkSk46etgwqxUFvN35lSkjgb3a7HTqFXDWVmXpj3GJ34ixD8qjTsu1CT_H6sUvabZjzOeJBsFlwqucr1X2jy9evs-1-Ty06AS1ePnqfgG5aXRI6PullQIDkLWquUGK-/s200/synonym.png" alt="" id="BLOGGER_PHOTO_ID_5236251852167826818" border="0" /></a> <table style="border-collapse: collapse; width: 268px; height: 138px;" border="0" cellpadding="0" cellspacing="0"><col style="width: 48pt;" span="2" width="64"> <col style="width: 48pt;" width="64"> <tbody><tr><td style="vertical-align: top; text-align: center;">synonyms<br /></td><td style="vertical-align: top; text-align: center;">words<br /></td><td style="vertical-align: top; text-align: center;">percent<br /></td><td style="vertical-align: top;"><br /></td></tr><tr style="height: 15pt;" height="20"> <td style="height: 15pt; width: 48pt;" align="right" height="20" width="64">0</td> <td style="width: 48pt;" align="right" width="64">36916</td> <td class="xl65" style="width: 48pt;" align="right" width="64">25%</td><td style="vertical-align: top;"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">1</td> <td align="right">48004</td> <td class="xl65" align="right">33%</td><td style="vertical-align: top;"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">2</td> <td align="right">24998</td> <td class="xl65" align="right">17%</td><td style="vertical-align: top;"><br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">74</td> <td align="right">1</td> <td class="xl65" align="right">0%</td><td style="vertical-align: top; text-align: right;">"hold"<br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">97</td> <td align="right">1</td> <td class="xl65" align="right">0%</td><td style="vertical-align: top; text-align: right;">"pass"<br /></td> </tr> <tr style="height: 15pt;" height="20"> <td style="height: 15pt;" align="right" height="20">99</td> <td align="right">1</td> <td class="xl65" align="right">0%</td><td style="vertical-align: top; text-align: right;">"break"<br /></td> </tr> </tbody></table></li><br /></ol><span style="font-weight: bold;">Summary:</span><br />It appears that a small percentage (20%) have hyponyms, though those that day may have quite a few (no surprise). Interestingly the mean number of hyponyms and hypernyms per word are the same as there are the same total number of hypernyms and hyponyms. The relationship is apparently symmetric.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0tag:blogger.com,1999:blog-2145394781577683363.post-83670918348174244782008-08-14T12:26:00.000-04:002008-08-14T15:27:43.556-04:00First Experience with Named Entity Recognition<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8UO29G_iJaZzyKMzwcs_kLzyrF8HyNeC7s71cXY7fsBGoYmK5G0uqTjIhZ6S_z_3C8gdIXQ6mC7Rxe7C8w7bbeCGvToggDoh4piXrMztGj0iYe4bZwfELAqCZkbPKFzEef9h3w037Orlo/s1600-h/UoL_GATE.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8UO29G_iJaZzyKMzwcs_kLzyrF8HyNeC7s71cXY7fsBGoYmK5G0uqTjIhZ6S_z_3C8gdIXQ6mC7Rxe7C8w7bbeCGvToggDoh4piXrMztGj0iYe4bZwfELAqCZkbPKFzEef9h3w037Orlo/s400/UoL_GATE.png" alt="" id="BLOGGER_PHOTO_ID_5234411182498468498" border="0" /></a><br />I must preface this email by saying that I am fairly new to Text Mining or NLP. I just tried named entity recognition as written in the GATE tool on a Wikipedia entry on the University of Louisville. All parameters are set to defaults for ANNIE. The results can be seen by taking a look at the included image.<br /><br />The results to me suggest that this is a very difficult task. A majority of abbreviations such as NIH, NCAA and U of L could not be discovered what type of entity these were. It is also interesting that the University of Louisville is labeled differently in different parts of the text. In the table description (stripped of its structure in this context) labels just "Louisville" as a location, which is true but not the best labeling. While the second occurrence in the first paragraph labels the "University of Louisville" as an organization which is the correct label.<br /><br />It is interesting to me that the NIH, when the full name was spelled out that the word "National" was left out and the abbreviation "NIH" was not labeled as an organization but is an unknown entity.<br /><br />Lastly the labeling of persons did not do well. The president of the university is labeled correctly (Dr. James R. Ramsey), but "Faculty", "Urban", and "General Assembly" are also labeled as people. The precision of the persons label is 1/5, not so good. Though I guess the recall is a perfect 1/1.Brentwellhttp://www.blogger.com/profile/12617431514339027902noreply@blogger.com0