Cherchez la femme

Cherchez la femme

Israeli Defence Forces Spokesperson Unit has published a booklet titled “60 Years of Women’s Service in the IDF” (in English,
downloadable). The book is full of interesting facts, numbers, and personal
stories and contains a lot of photos. As an active IDF reserve soldier, I
was very curious about this booklet.

My first impression was very positive: nice page rendering, fabulous
photographs and a lot of interesting and relevant information. It looks very
professional, except for one thing-the graphs. The graphs in this and all other
similar brochures published by the IDF are shamefully ugly, infantile and unprofessional.

Look at this page:

Please tell me how the person who designed this nice looking page added these
ugly graphs?

Not only did the IDF infantile geniuses use pie charts — one of the least informative
types of visual representation of data — they also distorted them in 3-D! Why
did they do that? Does this provide more information? No it doesn’t! Do 3-D
graphs look nicer and cooler? Unless you go back to the 90’s, they don’t! Does
the fake perspective distort the small number of data each graph contains? Of
course it does!

Putting the form aside, take a look at this interesting graph that describes
the percentage of women in different sectors of the Border Guard Corps:

Although I was demobilized from the Border Guard Corps fourteen years ago, I
can assure you that not all (100%) urban patrol Border Guards are women. Take a
walk in Eastern or Western Jerusalem and see for
yourself.

This post was proofread by Kathryn John­son, ‘Sim­ply Syn­tax’, proofreader@​simplysyntax.​net

  • Share/Bookmark

Presenting your own death

Presenting your own death

Dr. Randy Pausch was a computer scientist who one day discovered he was having a cancer with metastases all over his body. People react in all sorts of ways to such disastrous news. Dr. Randy Pausch started giving his “last lecture

Dr. Pausch Passed Away On July 25, 2008.

  • Share/Bookmark

Three words, two mistakes

Three words, two mistakes

Lanthanum, Europium, Erbium, Neodymium belong to the group of what is called “rare earth elements“. But did you know that much like inert gases are not inert, rare earth elements are neither rare, nor earth. For example, there is more lanthanum (one of the “rare earth elements”) on this planet than silver or lead.

Now you have yet another facinating topic to talk about with a girl (boy) on a date. If you want more juicy details on this issue, go read the full story on Popular Mechanics.

  • Share/Bookmark

Powerpoint templates for scientific posters

Powerpoint templates for scientific posters

Poster sessions are very strange creatures. Personally, I don’t understand the usefulness of this format. For me, the signal to noise ratio of a poster session is low, the density of people is high and the frustration is endless.

I admit that there are many people who think that scientific posters are a valuable means of communication. Dr Faulke Zen, for example is one of such people. One of his blogs, http://betterposters.blogspot.com/ is dedicated to the noble goal of creating better posters.

Anyhow, if you need or want to create a scientific poster, you might be interested this site that provides several Powerpoint templates for posters.

By the way, there are several dedicated programs for poster creation. Although I have never worked with any such a program, I assume that some of them makes it easier creating good looking posters. Despite this fact, a recent poll by Dr. Zen revealed that the majority of the responders used Powerpoint for poster preparation.

Enjoy

  • Share/Bookmark

Support vector what?

Support vector what?
Image via Wikipedia

Support vector machines are a “magic” classification algorithm that maps examples as points in space, so that the examples of different categories can be divided by a single hyper-plain in the hyperspace.  SVM’s are widely used in machine learning, especially when the mechanism underlying the data isn’t known or is not of interest.  For many people this explanation sounds very complicated. If you are one of them, see this short video (less than a minute) for an intuitive illustration

or this long lecture by Prof. Andrew Ng for a deep explanation:

The interested reader may read An Introduction to Support Vector Machines and Other Kernel-based Learning Methods by Cristianini and Cristianini

  • Share/Bookmark

Numbers are nothing, resolution is everything

Numbers are nothing, resolution is everything

I work with measured data. All the experimental and most of the theoretical scientists work with measured data. Signal intensity of a DNA microchip, drug concentration in the plasma, frequency of coronal mass ejections, are all measured data. Moreover, most of the measured data is not obtained directly, but rather is a calculation result of several sources of measured data. Did you ever ask yourself how accurate your data is? Did you ever ask yourself how many digits after the decimal point should you look at?  If you didn’t – you should.

Most standard statistical methods treat numerical data as if they were real (infinite number-of-decimal-places) observations. The issue of quantization or digital resolution is recognized by engineers and metrologists, but is largely ignored by statisticians and can render standard statistical methods inappropriate and misleading.

The above sentences are taken from Likelihood-Based Statistical Estimation From Quantized Data by Vardeman and Lee . So, next time you think you have manged to improve ligand docking results by 0.01 Angstrom, next time you trim out digits after the decimal point only to make the numbers fit your table, please read this article.

The links below were automatically suggested by Zemanta (a content suggestion engine). I didn’t see them yet, but they surely seem to be relevant.


Flickr illustration by  dog on wheels

  • Share/Bookmark

Mechanical suit for the paralyzed

This is an interesting video that shows an “exoskeleton” – a mechanical suit that enables paralyzed people to walk.

  • Share/Bookmark

Mind blower: a point in the 10th dimension

Mind blower: a point in the 10th dimension

I learned the concept of dimensions when I was in the fifths grade. I also learned then that there are more than three dimensions in our Universe.  I even knew that the time can be considered as the fourth dimension. Any additional dimensions remained in my mind absolutely theoretical.

Can you imagine a point in the 10-th dimension? I couldn’t. Not until I saw the video below, which is a promo for Imagining the Tenth Dimension: A New Way of Thinking About Time and Space book

  • Share/Bookmark

Find cure for cancer while playing games

Find cure for cancer while playing games

I have never been a big fan of computer games. Not because I think I’m too smart, too intelligent or too busy to play games, but mostly because I suck. I am a terrible gamer. Take any game genre: shooters, strategy, arcade, any genre and I will suck in it. “At least”, I was calming my ego down, “I’m a pretty good scientist”. Why shouldn’t my ego calm down? I’m good in  molecular modelling and docking; PK/PD analysis; biomathematics, data mining and bioinformatics. Can these nerdy gamers beat me in these fields? Well, it seems they can. Not only they can, they can even get results that justify a Nature publication. I feel stupid and redundant

  • Share/Bookmark

Earth’s umbical cord found

Earth’s umbical cord found

I find geology exciting. All those earthquakes, eruptions, seas and oceans appearing and disappearing fascinate me very much. Every time I cross the Judean or the Aravah deserts with my daughters I keep talking about all those evident signs of ancient seas that transformed into magnificent layers in the rocks. Of course nobody listens to me, but at least I enjoy talking.

Imagine you suddenly discovered part of your umbilical cord was still
attached. Scientists just did that for the planet Earth. What’s been
found is … a chunk of pristine, undisturbed rock from the time when
Earth was nothing but molten rock.

That is what Larry O’Hanlon of DiscoveryNews is reporting. Isn’t this interesting? Guys, gals? Hello! Isn’t this interesting? Oh, never mind I will just keep talking to myself here.

Illustration image copyright by Andrew Curtis under the Creative Commons license

  • Share/Bookmark

Logo Déjà vu

Logo Déjà vu

Prigat – an Israeli juice and syrup manufacturer has a new logo

They also have a new slogan, which I find very stupid. Back to the logo. Doesn’t it look similar (very very similar) to this one:

I’m sure Prigat’s branding company have seen Amazon’s logo here and there. I just can’t figure out what they were thinking about…

  • Share/Bookmark

It’s all about the right aspect – on the optimal aspect ratio of a chart

It’s all about the right aspect – on the optimal aspect ratio of a chart

Aspect ratio of an object is the ratio of its longer dimension to its another dimension. What is the optimal aspect ratio of a chart?
Depending on the data on that chart, the answer to this question may vary between a trivial one to a result of complex calculations.
Let’s start with the simplest case. I have already discussed it the better observed vs. predicted plots post series. One of the posts in that series dealt with aspect ratio of an observed vs. predicted plot which should be 1.

This rule can be extended to any case where the two chart axes represent the same units of measurement. Only in these cases the definition of aspect ratio should be changed from the ratio between the plot dimensions to that of plot scales. If X- and Y- axes stand for the same units, their scales should be equal, unless it is practically impossible. Take for example Bland and Altman’s seminal paper “Statistical methods for assessing agreement between two methods of clinical measurement“. In that paper the authors suggested to assess the agreement between two methods by taking several measurements using the two methods and then plotting the differences against the mean measurement. Below is a Bland-Altman plot for some hypothetical agreement data. Note that every centimetre (inch, pixel) in each direction represents the same amount of measurement units

Even if you are less lucky and the differences between the tested methods are much bigger, you should stick to that rule:

Even if you are very lucky and the differences are very small, stick to that rule. If you feel that too short Y axis might hide important details, show two versions of the same graph. The one with the equal ratio between the axes scales will be used for the general context and the expanded one will serve to provide the fine details.

Of course, there are cases where the same axis scales won’t fit. The graph below was generated by the GapMinder site. This graph shows the relationship between the income per person in various countries and the average hourly compensation in those countries. It is obvious that despite the fact that both axes express U.S. dollars, their orders of magnitude are completely different:

I will discuss this and other tough problems of selecting optimal aspect ratio for the data plots in a day or two. Stay tuned

  • Share/Bookmark

I’m looking for a partner

I’m looking for a partner

I have an idea for a small research project that requires adapting polls (a poll that adapts itself to the previous answers a user provides). I think that this project belongs to the interface between social sciences and psychology. It aims to test the difference between how various user groups precept different countries, languages and populations. Unfortunately, I don’t have the required technical skills to build such a poll, neither do I have the time to learn how to do this properly.

Please forgive me for not sharing more details, as I don’t feel comfortable discussing all the details publicly, before we have some results to share.
Please contact me (boris@gorelik.net) if you are willing to consider collaboration and I will provide you more details.

Illustration image By pardeshi
under the Creative Commons license

  • Share/Bookmark

Love data visualization? You must see this blog

Love data visualization? You must see this blog

A blog with a promising name “Chart port” is a nice collection of material on data visualization. If you, like me, love this kind of data analysis, make sure you visit it once in a while.

  • Share/Bookmark

Technical note: breaking table rules

Technical note: breaking table rules

I friend of mine asked me how I created the broken rules in my posts on more effective table typography. I’m not sure what is the best way to achieve this, but usually I do the following. The short story is “empty rows and columns”. Read on to learn more:

============== Read More »

  • Share/Bookmark

First things first

First things first

One key feature of complex graphs is that they are … complex. This is pretty obvious. I mean, the graphs are not obvious and this is an obvious statement. In other words,  it is obvious that non-obvious graphs are not obvious.

Enough with that …

============== Read More »

  • Share/Bookmark

Great minds think alike

Great minds think alike

Chris, one of the most active readers of this blog, wrote the following comment:

I would go a little bit further with the third example and let the rules just underline the important bits of information. In case of the headings (e.g. “Logistic regression”) I would run it until the end of the validation column and for the subtitles (e.g. “Training”) I would just make it as long as the column is wide (actually a tiny bit smaller, so there is a break).

Chris was very kind to send me his version of the table. Here it is:

Actually, Chris was much faster than me. His suggestion came before I published my version of the table (but I swear, my version existed before I saw Chris’s comment). This is how my version looked like:

Indeed, great minds think alike. One thing I like in Chris’ version is the larger distance between the rules, making the headings read easier. BTW, you should check Chris’ site. It is full of absolutely wonderful photographs.

  • Share/Bookmark

The rule of broken rules, or the importance of white space

The rule of broken rules, or the importance of white space

We continue our search for more aesthetic and more effective tables. Last time, we discussed the importance of rules, or lack of thereof. A rule, to remind you, is the typographic term for a line, which, as we have learned, needs to be used sparingly.
Today we will talk about white space, and how it can make your tables look better, read easier, and deliver the message more clearly. Continue reading: ============== Read More »

  • Share/Bookmark

June 2010 statistics for inthehaytsack.com

June 2010 statistics for inthehaytsack.com

I continue publishing monthly visit statistics for this blog. I was very lazy during this month (actually very busy), so there were not too many new posts. I promise to try better. Anyhow, you can click here to download the statistics. Fellow bloggers, can you please share with me your monthly statistics? Thank you

  • Share/Bookmark

Be the master of your domain

Be the master of your domain

Peter D. Ward studies life on Earth. In his TED talk Peter Wards presents his “Medea Hypothesis” that states that most of Earth’s mass extinctions were caused by lowly bacteria. This is a very interesting talk, but it is not its contents I want to talk about, but the form. I will use this talk to demonstrate four rules that will help you to stay away from those pesky presentation glytches. Enjoy reading

============== Read More »

  • Share/Bookmark

Be the most

Be the most

Would you bother reading a paper that deals with a situation that is not the most deadly, dangerous, common or nasty among a certain population? Would you bother writing such a paper? Recently I have noticed that a phenomenon that I call Boris observation #6:  many papers that I read and that deal with various cancer conditions start with the same template:

______ cancer is the most ______ cancer among ______

For example:

Breast cancer is the most common malignancy in United States women

Al-Hajj et al. PNAS, 100(7) 2003:3983-3988

Or

Prostate cancer is the most frequently diagnosed solid tumor in US men

Stanford et al JAMA, 283 2000:354-360.

Or

Lung cancer is the most common cause of cancer deaths in both men and women worldwide

Fukuoka et al J Clin Onc, 21(12)2003: 2237-2246

Or

Ovarian cancer is the most lethal of all gynecological cancers among women

Saldova et al. Glycobiology, 17(12) 2007: 1344-56

There are 18,600 Google Scholar search results for “cancer is the most” query limited to biology and medicine. Even if only 25% of them correspond to unique entries, this is a very large number. I know, I know, you are supposed and expected to open any paper with a general sentence that presents a problem and that can be understood by any reasonably intelligent person. But, come on! What’s wrong with

Mesenchymal chondrosarcoma (MCS) is a rare disease that accounts for about 1% of all chondrosarcomas.

Gorelik et al. Cancer research, 68(21) 2009: 9033-40

BTW, I’m glad Saldova et al from the ovarian cancer citation didn’t include gynecological cancers among men in their statistics :-)

  • Share/Bookmark

Statistics doesn’t lie, some statisticians do

Following the statistics quiz in my review of Greg Gigerenzer’s “Calculated Risks“, a reader of mine pointed me to this presentation. This one has not one, but two quizzes and is also very eye opening

In case you failed one of the quizzes presented here and in case you didn’t read “Calculated Risks: How to Know When Numbers Deceive You” yet, go and read it

  • Share/Bookmark

And the winner is …

And the winner is …

Well, well, the pain­less the­sis writ­ing com­pe­ti­tion is over. It resulted in 15 more useful tips for thesis writers (additional to the original 23 ones).  The winner of this competition, as announced by Prof. Zen Faulkes is …

============== Read More »

  • Share/Bookmark

15 more tips for painless thesis writing

15 more tips for painless thesis writing

The painless thesis writing competition is over. Due to several technical problems, but mostly due to my ignorance of Twitter limitations on hash-tag search limits, most of the useful tips have been lost. Following are all the #painlessthesis twits that I have. The winner has been nominated, but the announcement will be delayed for a little bit more. Stay tuned. Meanwhile, read on to review all the valuable #painlessthesis tips

============== Read More »

  • Share/Bookmark

Effective tables: The rule of rules

Effective tables: The rule of rules

Last time, I had shown you an example of the average table you may find in any scientific report or technical document. I bet you create such tables all the time. I have also told you that such tables were wrong. Let’s start our search for better tables with the simplest, yet most effective rule. The rules of rules. Keep reading ============== Read More »

  • Share/Bookmark


About the author

Boris Gorelik

Hi, my name is Boris Gorelik. I am a bioinformatician, PK/PD modeller, algorithm developer, computational chemist and pharmacist. But most of all, I am a huge fan of elegant data presentation and visualization