**Graphic Discovery**, Howard Wainer: [E]ven the magic of logarithms does not always tame profound differences. One example of this appeared on July 8, 1997, in the automobile section of the

*New York Times*. It listed the names and prices of the forty-seven convertible car models then available. They ranged from the $15,475 Honda del Sol all the way up to the Ferrari F50, priced at a cool $487,000. The range of prices was so broad that the illustrator of the

*Times*could not fit all of them on the same graph and so had to resort to merely listing the most expensive ones separately. "Aha," I thought, "you should've tried logs!" Then I did, figuring that they would linearize the plot. They did, but not all cars fell on the same straight line (see figure 1). The thirty-four least expensive cars fell on one straight line, then there appeared to be a jump for the next seven cars, then another jump for three exotic European cars, and then on another line with a much steeper slope were the three most expensive cars.

Thus we have discovered that the cars at the upper reaches of the domain have prices that exceed what a single exponential increase would dictate. Would some other way of thinking about convertible prices provide a clear functional relationship between rank order and price?

Taking the next step (inverses) in this ladder of transformations does the trick [One approach when data are skewed is to transform them to symmetry. For mild skewness, a square root often works; when it is more extreme, a logarithm; when more extreme still, an inverse. These steps in progression are often called "Tukey's ladder of transformations."] If we make a plot showing how many of each car we could buy for one million dollars (about sixty-five Honda del Sols but only two Ferrari F50s), a simple linear relationship emerges (see figure 2). By making this transformation, we not only gain a linear relationship between rank and a function of price, but we also have a metric that is easier to think about and to comunicate to others. We also discover that two other groups of cars separate themselves from the pack: "nice convertibles" and "really nice convertibles".

Source: Graphic Discovery: A Trout in the Milk and Other Visual Adventures by Howard Wainer

related items:

Notes on the use of data transformations, Osborne, Jason (2002),

*Practical Assessment, Research & Evaluation*, 8(6)

Mahalanobis - am 2005-08-17 20:19 - Rubrik: mathstat

tikhonov meinte am 19. Aug, 14:49:

I get the point of clustering cars according to their price (into 'nice convertibles', 'luxury convertibles') like you did, but what sense does it make to fit a curve to the prices? Your explanatory variable (car type) is not even ordinal and the choice of models seems a little arbitrary to me... add a few more models in a similar price range and the fit will look much worse.
Mahalanobis antwortete am 19. Aug, 16:56:

The

listed the names and prices of the forty-seven convertible car models *New York Times***then available**. Therefore, the argument that adding a few more models will destroy the relationship isn't valid. Car type isn't intrinsically ordinal but it can be ranked by, say, price ;-D.

I think Howard Wainer did a great job in making the data accessible. Whether this relationship shows up each and every year is of secondary importance.

>>clustering cars..

**like you did**

I'd rather chew glass than use Comic Sans.

kimcil (guest) antwortete am 7. Apr, 12:23:

I am preparing a research paper and collecting information on this topic. Your post is one of the better that I have read. Thank you for putting this information into one post.alfamart official partner merchandise fifa piala dunia brazil 2014

Unit Link Terbaik di Indonesia Commonwealth Life Investra Link

typekey:junkcharts meinte am 20. Aug, 06:33:

reading the graph correctly

Please note that Howard did not cluster the prices. All he did was to rank-order them, line them up from lowest to highest; then, he found that at some points the prices jumped; from that, he inferred that there were a number of clusters.Ironically, this clustering is much better seen in the log scale than in the reciprocal scale, the latter essentially having linearized everything. What Howard successfully did was to provide insight by comparing the "ladder of transformations". However, if we had started out with the reciprocal chart, then it would have been tough to determine the clustering. I have a few other related comments on transformations at my blog.

I do not agree that the reciprocal chart is "easier to communicate to others". It is just not a natural way for us to think about the quantity "how many cars can $1 million buy?"

I agree that the data is presented well but like the first commenter, it is fitting the straight line which irks me. Again more on my blog but the gist is that there is an implied linear regression between price and rank-order which is of limited use.

Mahalanobis antwortete am 20. Aug, 17:54:

Ok, maybe my initial

enthusiasm stemmed from the fact that one rarely sees reciprocal charts in the wild.
maskodok antwortete am 15. Jan, 01:16:

Interesting topic for a blog. I have been searching the Internet for fun and came upon your website. Fabulous post. Thanks a ton for sharing your knowledge! It is great to see that some people still put in an effort into managing their websites. I'll be sure to check back again real soon.Mobil Sedan COrolla,IDrpoker.com agen Texas poker Online Indonesia Terpercaya, Mobil Sedan COrolla, Cipto Junaedy

tikhonov meinte am 20. Aug, 19:54:

First of all sorry to Mahalanobis for mistaking you for the author of the graph, I should read more carefully next time.I do think that the example is a nice illustration of transformations, but like typekey:junkcharts I'm doubting the usefullness of regression in this case. However I do not think that reciprocals (how many cars does $1m buy?) are too hard to communicate (as least not as hard as logarithms) Just think of fuel consumption which is stated as 'miles per gallon' in the US but as 'litres per 100 km' in Europe and it is hard to say which is the 'natural' way to think about it and which is the reciprocal.

typekey:junkcharts meinte am 20. Aug, 23:13:

value of analysis

While we disagree on the usefulness of the line-fitting, I must point out what I think is the key take-away from Howard's analysis. It illustrates that creating data graphics is an iterative activity. It would be a tragedy if the graphic designer were not to create all versions of these and then decide on which version to show to the public.I'd have chosen the log scale in this case but it really depends on what you want to say.