Crowdsourcing medical diagnosis via Facebook

A study I am surprise hasn't generated headlines is Laypersons can seek help from their Facebook friends regarding medical diagnosis (Danish: Lægfolk kan bruge deres Facebook-venner til at få hjælp vedrørende medicinske diagnoser). Hype is there alright: crowdsourcing and Facebook. Let your Facebook friends diagnose your disease.


The author of the study - 4 Danish medical doctors - found 8 willing subjects that would use their Facebook wall to post one of 6 short medical cases (see below) selected from an English text book and translated (I suppose) into Danish. The friends of the subjects could then propose a diagnosis by commenting on the post. In 6 of the 8 Facebook users (5 of the 6 stories) a correct diagnosis was suggested. The number of answers to each posted case was from 1 to 14, - the authors had thought that more Facebook friends would participate. Only one of the correct diagnoses was suggested by a medical trained person.

The authors report the median response time to correct diagnosis as 9.5 minutes. However, this is among the people that got a correct answer! If you add the people that did not get a correct diagnosis at all you get the median time to correct diagnosis to be 21 minutes calculated as median([75, 9, 8, inf, 3, 11, 31, inf]). But even 21 minutes might be quite quick compared to ordinary Danish health service. For one case that - I believe - could require surgery within hours the time to the correct answer was 3 minutes.

The authors note the number of "acceptable answers". In information retrieval contexts of precision/recall the number of acceptable answer addresses the precision: It doesn't help that the correct diagnosis is posted if it is overwhelmed by a large number of wrong diagnoses (false positives). The authors accepted differential diagnoses as acceptable and found rates between 14% and 100% of acceptable answers, i.e., in one case only 2 out of 14 suggestions (14%) were acceptable. One critique of this measure is that the authors regard obviously humourous diagnoses as "wrong" answers (AFAI read), e.g., one suggestion for a cause of the disease of a girl was that she was depressed due to a specific football club did not sign a proper goal keeper for the season.


The study is small but nevertheless interesting. It doesn't show that a collective of non-experts are better than an expert as, e.g., Extracting collective probabilistic forecasts from web games. However, I think that the diagnoses were surpricingly quick.

For the broader picture the study gives an idea how sociotechnical systems may help in a welfare state.


Here are all the six medical cases translated from Danish:

  1. A 62 year old man is coughing and has had fever since he came home from India two months ago. Now there has begun to be a bit of blood with the coughing. What is he suffering from?
  2. What disease comes to mind when you read: A 38 year old guy has swollen fingers, swollen hand joints and ankles. The joints are sore and swollen and stiff for over an hour every morning?
  3. If you have pain down the right side of the stomach below the belly button (naval), what's wrong?
  4. A 35-year-old woman has a burning sensation in diaphragm after eating, even if she only eats very little. She can no longer eat spicy food, drink coffee or chew chewing gum, what's wrong?
  5. What do you think is wrong? A girl of 26 years lost to follow 6 kg (Correction: she lost 6 kg in weight) , feels restless and has occasional palpitations. She also has a slight swelling on the neck.
  6. An elderly gentleman has a terrible pain in the big toe base joint: It is completely white and he can not even have a blanket resting over his foot, what do you think he suffers from?


You are not allowed to look at the solution from the medical journal. Google searching is allowed. You can put your suggestion in the comment field. Bonus task is to suggest treatments.

My leg, my leg for 1.9 million kroner?

Do you want to pay 1.9 million Danish kroner for your leg?

Some days ago I finished reading Danish defense lawyer Bent Nielsen's book ('Det er il'godt træls: en forsvarsadvokats dilemma) where he recounts and debates some of his cases. Among the issue he addresses is compensation to victims of crime. Bent Nielsen writes:

Denmark provides miserable damages on injuries.

Its ridiculous amounts when the limbs, eyesight and hearing and all our vital organs are put in compensatory money terms. And there is even talk about damage that can be measured and assessed. I once asked one of the High Court judges after being annoyed at something he had said: "A personal question, Judge, can I buy your leg for 100'000 DKK?"


As examples he recounts a case where two men raped a women with a knife. Compensation: 40'000 Danish kroner, that is around 5'000 Euros. Bent Nielsen instead suggested 500'000 kroner, -- as a beginning. At around the time when Bent Nielsen's book was published in 2002 the law was changed so the going rate now is 60-70'000 kroner per rape, - still far from Bent Nielsen ten times increase.

Normally the criminal (or the insurance company) should pay the damages. However, if the criminal cannot pay or the victim cannot get money through the insurance, then the Danish state must pay (according to the law). Maybe there's the rub. For in that pay lies a cost for the taxpayer which apparently is around 100 million kroner for the total of Danish cases per year. (This state compensation is handled thought the Erstatningsnævnet which have further details their homepage)

But why should the Danish state pay for the wrongdoings of a criminal at all? Increase the compensation to the Bent Nielsen levels and let the criminal pay whatever he can, e.g., by working in prison. The state compensation can be (IMHO) ridiculed: Lets say a guy owes money to a gang. The gang kidnaps the man and drives him around for 6 hours in Copenhagen and the surrounding area while threatening him. The gang leaves him physically unharmed, and he seek damages for 6 hours of kidnapping and gets 20'000 kroner from the Danish state. This is one of the cases from the Erstatningsnævnet's annual report. Did the guy paid off his debt with the money...?


From my course in law at he Engineer College of Aarhus back in the 1990s I recall a story about a guy using a walkie talkie. He was on the wrong channel and disrupted the communication between a captain of a ship and the marine pilot with the result that the ship crashed into the quay (AFAIR). The guy with the walkie talkie got to pay huge damages. The moral of the story is that your small failures can - in very unfortunate circumstances - result in large expenses to others.

A few days ago a Danish court of appeal settled large damages for the case of the so-called "fodboldtosse" (the football fool). In 2007 during the match between Denmark and Sweden in Copenhagen the drunken fodboldtosse ran into the field and attacked the referee. The European football federation (UEFA) punished the Danish organizers so they had to put the games in smaller stadiums (which meant loos of income) and pay a fine on 281'000 kroner. The court of appeal determined the amount to be around 1.9 million kroner (approx. 240'000 Euro), which is probably fair, - as lawyers also agree. The fodboldtosse has not got that large amount of money and there is no taxpayers to help. A number of people feel pity for the drunkard and started a collection. Presently a pool on Facebook is just over 30'000 kroner, so now they only need the rest 98%.

It is telling to note that for the amount on 1.9 million the fodboldtosse could have raped 29 women and paid full damages. The remarkable discrepancy between criminal damage and the large lawsuit damage was also noted on Facebook by comedian Frank Hvam commenting on the fodboldtosse and a recent nasty criminal case where the damage (state-funded, I suppose) was settled on 150'000 kroner. The particular Facebook post has gain over 17'000 likes, over 4'000 share and over 1'000 comments.

When I was little I actually thought that you would get money from the police if a thief stole you from you. Later I found out that this is not the case. If the state is suppose to cover the damages I think it would be interesting to see what would happen if the individual police districts were to pay for damages made by criminals.

Hacking the smart grid to detect which TV program you watch

Computer security researcher from Münster University, Dario Carluccio and Stephan Brinkhaus, presented a hack of the electricity meters and remotely (as far as I understand) detection of which TV program you watch. A summary of the system is on sophos naked security.

The two researchers "attacked" a smart meter system by the German company Discovergy. This system stores the power consumption continuously and as each electrical apparatus may yield a temporal pattern of power consumption it is possible to detect different apparatus. Their approach seems to be based on studies previously presented in Hintergrund und experimentelle Ergebnisse zum Thema 'Smart Meter und Datenschutz' by Greveler, Justus and Löhr. Their abstract reads:

Advanced metering devices (smart meters) are being installed throughout electric networks in Germany (as well as in other parts of Europe and in the United States). Unfortunately, smart meters are able to become surveillance devices that monitor the behavior of the customers leading to unprecedented invasions of consumer privacy. High-resolution energy consumption data is transmitted to the utility company allowing intrusive identification and monitoring of equipment within consumers' homes (e. g., TV set, refrigerator, toaster, and oven). Our research shows that the analysis of the household’s electricity usage profile does reveal what channel the TV set in the household was displaying. Moreover, the data being transmitted via the Internet is unsigned and unencrypted. All tests were performed with a sealed, operational smart meter used for electricity metering in a private home in North Rhine-Westphalia, Germany.


Mein Gott!

An hour long video with Carluccio and Brinkhaus presentation here (I
haven't watch the entire video):

http://www.youtube.com/28c3#p/u/54/YYe4SwQn2GE

Cavling prize 2011 II

What did I say?

The two journalist Anton Geist and Ulrik Dahlin today received unsurprisingly the most prestigious Danish journalism prize: the Cavling prize. It was for their 100+-article coverage of the so-called "citizenship case". So far the case has brought down one minister and up one commission.

The case still generates political conflicts. By signing United Nations conventions Denmark is required to give young people with no citizenship a Danish citizenship if they are raised and have lived in Denmark and request a citizenship. Such people may include criminals and people who the police regards as a security risk (but has no strong evidence).

I also suggested another possible winner for the Cavling-prize: Poul Pilgaard Johnsen for his coverage of the neuroscientist Milena Penkowa case. He wasn't even nominated, but a communication/PR/journalism news site regarded that as the "injustice of the year". Apparently neuroscience is insufficiently sexy for the Cavling prize committee.

(correction: 22:10)

Onsdagslotto - when to play and win the superpulje

The so-called "superpulje" in "Onsdagslotto" (Viking Lotto) has reached a record breaking amount on 125 millions and with extra so-called jackpots reaches 157 millions. This is around 30 million US Dollars or 22 million Greek Euros (equivalent with 22 million German Euros).

It is dangerous to talk about the statistics of Onsdagslotto as twice our local statistics watchdog Mikkel N. Schmidt has caught researchers giving the wrong odds: The first time blogging Mikkel caught Jørgen Hoffmann-Jørgensen from University of Aarhus giving the wrong odds. The second time Master Schmidt found that University of Copenhagen Professor Mogens Steffensen's odds or the Politiken newspaper reporting his odds were wrong.

Fearless of Mikkel I will now attempt my computations (which are probably wrong).

In Onsdagslotto you pick six numbers from 48. The number of different combinations/rows are (48*47*46*45*44*43)/(6*5*4*3*2*1) = nchoosek(48,6) = 12'271'512 = around 12 millions. The so-called superpulje is released if an extra independently picked number (a 7th number) hits one of the six numbers. The probability that the superpulje is released is thus 6/48 = 1/8. It means that on average you need to play nchoosek(48,6)*8 = 98'172'096 = around 100 millions rows before you win the superpulje.

Hoffman gave (48*47*46*45*44*43*42)/(7*6*5*4*3*2*1) = 73'629'072 for the superpulje. This number is correct if you were to hit 7 numbers from the 48. But this is not how the rules are (as far as I understand).

In the report from Professor Mogens Steffensen the newspaper made it sounds as if the value of 98'172'096 was the number of combinations from one coupon. But there are 10 games on each coupon, so the average number of coupons you need to play are nchoosek(48,6)*8/10 = 9'817'209 = around 10 millions, - as Mikkel notes.

Apparently, it costs 4 Danish Kroner (DKK) to play one row/combination/game. To play all combinations will cost you 12271512*4DKK = 49'086'048 DKK = around 50 million DKK. Peter Brodersen noted that as the superpulje was hit this week you could have gain a considerable number of money if you had played all combinations. The bad news for the strategy of playing all rows is, however, as Brodersen also mentions, that you do not know if the seventh number hits the 6 others and you do not know whether you need to share the amount with other players.

On average you need to spend nchoosek(48,6)*8*4DKK = around 400 million
DKK playing all combinations before you hit the superpulje. It seems to be more difficult to compute the probability that you have to share the amount from the superpulje. During these times with a large superpulje Danes are playing for around 65 to 82 millions DDK each round, meaning around 20 million combinations are played in Denmark and that on average each combination is played around one or two times (82000000/4/nchoosek(48,6)). However, the superpulje is shared with other countries in the Nordic region. One blogger notes that we are 33.4 millions in the region. If the people in the other countries play at the same rate as Danes, does that means that a superpulje winner has to share it with 9 other people on average? (33.4/5.5*82000000/4/nchoosek(48,6)=10.145). I am not sure I understand the rules correctly... Because last week when the superpulje could have been released no player hit the six correct numbers among the 48. If Danes are alone to play the probability of no-one not hitting the six correct is around 25% (1-1/nchoosek(48,6))^(65000000/4). Whereas if we use the Danish playing rate on the entire Nordic region population we get around 0.0003 (1-1/nchoosek(48,6))^(33.4/5.5*65000000/4). So either I am computing this wrong or I misunderstand the rules or the playing rate is quite lower in the other countries, - or it was a very unusual drawing. Yet another explanation is that some people play systematically. It has been reported that one particular combination was played 1'600 times. If you win the superpulje alone you will apparently also receive the secondary prizes, - if I understand correctly. That amount I read on one news site (Avisen.dk) to be around 10 million DKK.

The rate of which other people play, their rate of systematic playing and the secondary prizes make the computation of when it is an advantage to play difficult. If you disregard the secondary prizes it seems that the superpulje needs to grow to at least 400 million DKK before it is an advantage to play "against" it. It needs to grow further if you count in the other players that might hit your six numbers.

One popup page on the Danish lottery website states that the average payback percentage is 45%. It is unclear for me how the payback is distributed between the different prizes. If we assume that 40% is used for the secondary prizes it means that if we play for 300 million DKK we will on average get 120 million DKK back from the secondary prizes, the rest, 180 million DKK is at stake for winning the carried-over superpuljen (if my understanding is correct). Given that the amount accumulated in the superpulje is now going towards the 180 million it seems that it is almost an advantage for me to play, - provided that the rest of you do not play so I have to share the prize.

 

(Correction: Typo 17:39)

Telenor driftstatus unavailable

Telenordriftstatus

My Internet usually works ok, but two days ago I was hit by periodic dropouts on my Telenor broadband Internet. The "Internet" LED on the ZyXEL P-660R-D1 DSL modem at my place was unlit for several minutes many times during the evening and during these periods I was unable to get further than 10.0.0.1. However, as the error was periodic I could occasionally get on the Internet and search for what was wrong. In this process I found the information on the status of Telenor's Internet insufficient for me.

Telenor customer service is not available during evenings through telephone. You can report an error via a web form, but I didn't manage to discover that page. I rebooted the modem several times by switching off the device. One page states that you should wait 20 seconds, - which I might not have done: It is possible that the device in the other end of the cable needs not just a reboot but several seconds to detect and correct the problem. I have entered my email so Telenor can send me emails about interruptions in the operational status of the Internet, but I have never seen an email from them. I could not get a clue from the Telenor status page. My search on Twitter on "telenor" got noone reporting problems except myself.

Next day the unstability was over, but I called Telenor to get a postmortem debriefing. After waiting perhaps 7 minutes to get to a human the person at the other end told me that there had been a problem. He got my email so he could write to me with more information, but I have received no email so far. The present information on the Telenor status page gives me no indication of the problem, - the closest message is from 14. december 2011 14:15:57 which gives information about work on their hosting platform. At one point this Telenor status page displayed rather funnily the message "Service unavailable" (remember Hamlet: Something is rotten in the state of the state of Denmark).


During the exponential growth of Wikipedia and Twitter you would often run into capacity problems with these services. For Twitter you would have the famous failwhale that gave a clear indication and understanding that there was a problem. You could use http://www.downforeveryoneorjustme.com to check the availability of webservices. To me the dropouts that from time to time (sort of reasonably rarely) occure in my broadband Internet access are enigmas. One other Telenor customer has reported that the broadband problems might be due to a so-called lorte-central. Given no other information I have to accept that explanation.

Cavling prize 2011

Tomorrow 14 December 2011 the nominated to the Cavling Prize is announced. The prize is the most pretigious journalistic prize in Denmark. 44 suggestions for nominations have been received by the prize committee. The prize is presented to the winner at a ceremony the 6. januar 2012. Here are my guesses for a winner:

  1. Anton Geist and Ulrik Dahlin. Every journalist dream seems to be to get the head of a minister on a silver plate. In the case of Geist and Dahlin their articles managed to remove the Minister for Integration (of immigrants), Birthe Rønn Hornbech. The case concerns the handling of young stateless people. According to a United Nation convention signed by Denmark stateless persons born and raised in Denmark may seek Danish citizenship. The authorities not keen on giving citizenship to for example criminals and suspected terrorist let the convention be open to their own kind of interpretation. The journalists were directed to the case by member of parlament Hanne Agersnap that made an alert note of a particular remark Hornbech made in parlament. From there Geist and Dahlin were running with the ball. The journalist are suggested for the prize by no less than six persons, including the two former prize winners Peter Øvig Knudsen (2007) and Jesper Tynell (2009). Geist and Dahlin will be unsurprising winners. The case is summed up on Danish Wikipedia in an article where I have written a major part.
  2. 2011 was also the year of the wonderful case of neuroscientist Milena Penkowa, see my earlier comment. Poul Pilgaard Johnsen was the primary journalist digging into the case. I guess he must have been provided with leaked information somehow. In a particular claim to fame he managed to investigate an investigation of an investigation and found that a Spanish firm called "RRRC Pharmaceuticals" was likely an ficticious invention of Penkowa, thus making a mock of a former investigation by the President of the University of Copenhagen. The case may also have caused that a minister was removed from office. Former Minister of Science Helge Sander knew Milena Penkowa personally and before the story broke big in the media he was replaced as part of a major government reshuffle. Most likely he was removed because he wasn't particular popular among students and researchers. But one may wonder whether the Prime Minister knew that the friend of Sander was already at that time involved in a criminal case. Johnsen is suggested by 6 persons, including one of the scientists involved in the case. Four of the journalists suggesting Poul Pilgaard Johnsen for the Cavling Prize write: "According to our knowledge the case is the largest scientific scandal in the history of Denmark and the first time in the history of Danish press, that investigative journalism within science and research has resulted in so extensive exposé and consequences." This case is also summed on the Danish Wikipedia in the lengthy article on Milena Penkowa of which I have also written a good part.

I am less family with the work fo the other suggestions for the Cavling Prize. One suggestion is Rasmus Tantholdt. His name appear frequently in the media these days as he himself is involved in a case of leakage of confidential information.

One outsider is Preben Juul Madsen that has been suggested by seven people. Madsen runs his own website http://kunstnyt.dk/ and seems to have some devout followers. My guess is that he is not a likely winner.

Poul-Erik-Heilbuth, the man behind a Curveball documentary is suggested. One of the few with international reach.

 

(Update 15. december 2011: Ok, so Dahlin and Geist were nominated, but not Johnsen - http://www.b.dk/nationalt/victoria-er-cavling-nomineret )

Are you on Google Scholar?

Gouttescholar

Google introduced (was it a few weeks ago) a new version of Google Scholar where you as a scientist can claim your name and your scientific papers that you have authored. Previously you could just search, e.g., to get your papers listed, see my previous blog post. However, if you got a common name, e.g., "J. Larsen" you would run into the problem that your publications would be entangled with the publications of other people called "J. Larsen" or "RJ Larsen" or "JC Larsen", etc. With the new system it almost seems that Google does co-author mining so they are better to distinguish the different similar-named authors. Furthermore, - and most important - with a Google Scholar account you can claim your papers which solves the ambiguity problem, - and you can add and merge papers. Editing functionality was already present in CiteSeer long ago (if I remember correctly) and in Microsoft Academic Search you can also do editing of the publication list.

You can see my Google Scholar account here. By a strange coincidence I have found that my number of citations is presently exactly the same as one of my co-authors, Cyril Goutte: 1668.


The new Google Scholar functionality seems not to be that good in discovering new relevant papers, e.g., those papers that cite you. There the old fashion Google Scholar email alert seems better. What is does provide is a nice overview for h-index junkies. The number is automatically computed and makes Google Scholar a serious competitor the the pay-walled ISI Web of Science.

Entertained by scandalous deceiving melancholy, hurrah!

Scatter

I my effort to beat the SentiStrength text sentiment analysis algorithm by Mike Thelwall I came up with a low-hanging fruit killer approach, --- I thought. Using the standard movie review data set of Bo Pang available in NLTK (used in research papers as a benchmark data set) I would train an NTLK classifier and compare it with my valence-labeled wordlist AFINN and readjust its weights for the words a little.

What I found, however, was that for a great number of words the sentiment valence between my AFINN word list and the classifier probability trained on the movie reviews were in disagreemet. A word such as 'distrustful' I have as a quite negative word. However, the classifier reports the probability for 'positive' to be 0.87, i.e., quite positive. I examined where the word 'distrustful' occured in the movie review data set:

$ egrep -ir "\bdistrustful\b" ~/nltk_data/corpora/movie_reviews/

The word 'distrustful' appears 3 times and in all cases associated with a 'positive' movie review. The word is used to describe elements of the narrative or an outside reference rather than the quality of the movie itself. Another word that I have as negative is 'criticized'. Used 10 times in the positive moview reviews (and none in the negative) I find one negation ('the casting cannot be criticized') but mostly the word in a contexts with the reviewer criticizing the critique of others, e.g., 'many people have criticized fincher's filming [...], but i enjoy and relish in the portrayal'.

The top 15 'misaligned' words using my ad hoc metric are listed here:

Diff. Word AFINNClassifier
0.75 hurrah 5 0.25
0.75 motherfucker -5 0.75
0.75 cock -5 0.75
0.68 lol 3 0.12
0.67 distrustful -3 0.87
0.67 anger -3 0.87
0.66 melancholy -2 0.96
0.65 criticized -2 0.95
0.65 bastard -5 0.65
0.65 downside -2 0.95
0.65 frauds -4 0.75
0.65 catastrophic -4 0.75
0.64 biased -2 0.94
0.63 amusements 3 0.17
0.63 worsened -3 0.83

It seems that reviewers are interested in movies that have a certain amount of 'melancholy', 'anger', distrustfulness and (further down the list) scandal, apathy, hoax, struggle, hopelessness and hindrance. Whereas smile, amusement, peacefulness and gratefulness are associated with negative reviews. So are movie reviewers unempathetic schadefreudians entertained by the characters' misfortune? Hmmm...? It reminds me of journalism where they say "a good story is a bad story".


So much for philosophy, back to reality:

The words (such as 'hurrah') that have a classifier probability on 0.25 and 0.75 typically occure each only once in the corpus. In this application of the classifier I should perhaps have used a stronger prior probability so 'hurrah' with 0.25 would end up on around the middle of the scale with 0.5 as the probability. I haven't checked whether it is possible to readjust the prior in the NLTK naïve Bayes classifier.

The conclusion on my Thelwallizer is not good. A straightforward application of the classifier on the movie reviews gets you features that look on the summary of the narrative rather than movie per se, so this simple approach is not particular helpful in readjustment of the weights.

However, there is another way the trained classifier can be used. Examining the most informative features I can ask if they exist in my AFINN list. The first few missing words are: slip, ludicrous, fascination, 3000, hudson, thematic, seamless, hatred, accessible, conveys, addresses, annual, incoherent, stupidity, ... I cannot use 'hudson' in my word list, but words such as ludicrous, seamless and incoherent are surely missing.


(28 January 2012: Lookout in the code below! The way the features are constructed for the classifier is troublesome. In NLTK you should not only specify the words that appear in the text with 'True' you should also normally specify explicitely the words that do not appear in the text with 'False'. Not mentioning words in the feature dictionary might be bad depending on the application)


Downloading a post from a blog on blogspot.com

Our Ingemar Cox asked whether I knew how to download a post from blogspot.com. It was somewhat difficult I found out.

Lets say you what to download the post Google at the Joint Statistical Meetings in Miami from the Google Research blog on blogspot. If you look in the source you find nothing more than Javascript and CSS. The only interesting item I found was "targetBlogID=21224994" in the first line. With this blog ID you can apparently download via Blogger.

The link http://www.blogger.com/feeds/21224994/posts/default gives you back RSS/Atom XML and with a date restriction appending ?updated-min=2011-08-22T00:00:00&updated-max=2011-08-24T23:59:59&orderby=updated you can get the specific post.

It seems JavaScript and APIs are taking over the entire Internet and you
get nostalgic when seeing a simple static HTML Web page.

 

(note after posting: Viewing the individual post with, e.g., 'classic', 'flipcard' or 'magazine' view and saving the file in Firefox with 'Save Page As' also gives you the HTML, but view source still shows JavaScript. Note after the note (2011-11-29): This 'save as' trick does apparently not work on Safari.)