R in The New York Times

by Kieran Healy on January 7, 2009

Funny to see the virtues of R extolled in The New York Times. Although I did wonder whether Professor Ripley spilled his tea when he read this effort at introducing Times readers to it:

Some people familiar with R describe it as a supercharged version of Microsoft’s Excel spreadsheet software that can help illuminate data trends more clearly than is possible by entering information into rows and columns.

On second thoughts, though, I imagine no tea was spilled. It would take rather more than that. There is the required bit of stuffy huffiness from a spokesperson for the SAS Institute, too:

SAS says it has noticed R’s rising popularity at universities, despite educational discounts on its own software, but it dismisses the technology as being of interest to a limited set of people working on very hard tasks. “I think it addresses a niche market for high-end data analysts that want free, readily available code,” said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

R also gets some stick (though not in the article) from the computer science side of things for being fairly slow in comparison to some potential competitors. But it’s an exemplary open-source project and is now the lingua franca of academic statistics, for good reason. In day-to-day use for its designed purpose it’s hard to beat. The commitment of many of the core project contributors is really remarkable. In the social sciences R’s main competitor is Stata, which also has many virtues (including a strong user community) but costs money to own. I like R because it helps keep your data analysis honest, it has very strong graphical capabilities, it’s a gateway to understanding new work in statistics, and it’s free. Just take my advice and be sure to read the Posting Guide before you start asking any questions on r-help.

{ 63 comments }

1

Barry 01.07.09 at 7:46 pm

It *is* funny to see reporters talking about such things; sort of like children talking about adult matters, only the children are frequently smarter and wiser.

2

noen 01.07.09 at 8:11 pm

The link to r-project.org appears to be defunct. There is just a web search squatting on the domain.

3

wytten 01.07.09 at 8:13 pm

noen, just stick www in front of it

4

Watson Aname 01.07.09 at 8:13 pm

@2: Try it with a leading www, like this: http://www.r-project.org/

5

noen 01.07.09 at 8:16 pm

Ahh… it needs the magic www. Sorry, should have thought of that.
R-Project

6

Watson Aname 01.07.09 at 8:25 pm

Well, that was an unfortunate ratio of posts to information. And I’m just making it better, aren’t I?

7

Bill Gardner 01.07.09 at 8:26 pm

The S / R syntax is also wonderful for working on problems involving arrays of more than two dimensions.

8

Neue Internetpräsenz 01.07.09 at 8:31 pm

This sounds like a European adult film: “The Erotic Flowering of R”

9

Kieran Healy 01.07.09 at 8:37 pm

Fixed. Sorry.

10

engels 01.07.09 at 8:38 pm

And I’m just making it better, aren’t I?

Not really, I’d say.

11

Watson Aname 01.07.09 at 8:49 pm

That’s the spirit, engels.

12

sg 01.07.09 at 8:51 pm

I agree with you Kieran but I find R awesomely annoying to work with. Yes the language is pithy and that’s good but it has a lot of idiosyncracies that can be annoying. The worst one to date so far has been the Japanese-language R’s platform dependence. A program I wrote on my computer failed on my colleague’s until we removed the comments.

Also, the mailing list helpy-forum is completely evil. Everyone on there is nasty.

13

Eszter Hargittai 01.07.09 at 9:23 pm

I’ve never used R, but I was made to use S-Plus back in grad school and my recollection is that it had a very different learning curve from that of Stata. Stata seemed way more accessible to me (and I don’t mean the WYSIWYG interface they’ve come out with recently, I don’t use that at all).

Didn’t R mainly kill S-Plus?

14

sg 01.07.09 at 9:31 pm

I find Stata really horrible to use compared to R. It manages to somehow mangle the flexible vector/matrix approach of R by merging it with the sequential approach of a 60s program. You have to have the linear model as the last modelling operation in order to extract residuals, right? This is very painful. And the way of looping through objects has to be the most painful thing I have ever seen. Applying those weird names in quotes to a programming object which itself has a funny name, and has to be manipulated in a funny way (it should be clear from this sentence I haven’t used Stata a lot).

Both of these packages have huge positives too, but the way they really fall over (and the only reason SAS remains popular IMHO) is that they just can’t handle big data. For my data sets I would need a computer with at least 8Gb of RAM if I wanted to use stata, even after crunching the data set to maximum possible efficiency. If I want to do anything exploratory and have a few more variables, I need 16Gb. And there’s no guarantee that even then I’ll be able to actually complete the analysis in my own lifetime.

15

SamChevre 01.07.09 at 9:40 pm

For big data IMO, there’s still nothing to beat APL

16

sg 01.07.09 at 9:42 pm

APL? Pray tell me more!

17

MH 01.07.09 at 9:51 pm

I’ve several thousand hours of SAS experience and dozens of existing programs that I use at least monthly plus a huge library of self-created templates I use to write SAS. When I write notes to myself, I sometimes end sentences with semicolons. I’ve always had to pick-up other packages for this or that reason, but I very much doubt SAS is in any real danger. I have no idea if R is better, but I do know there are thousands in more or less the same position as myself.

18

Trey 01.07.09 at 10:08 pm

I use both Stata and R but prefer Stata (or SAS) for large datasets.

19

Eszter Hargittai 01.07.09 at 10:11 pm

It seems fair to say that your preferences will be related to what you are trying to do with these programs. I rarely have data sets with hundreds of thousands of observations, for example (although some of my Stata-user friends do). I know of people who’re very enthusiastic about Stata, SAS and R. It doesn’t seem like any of these are at risk any time soon. S-Plus, again, another matter.

20

Dennis Howlett 01.07.09 at 10:21 pm

You mean to tell me that SAS Institute peeps haven’t seen the Linux Penguin on their in flight entertainment centers? My oh my.

21

MH 01.07.09 at 10:23 pm

Speaking of less-used statistical software, does anybody use SPSS? It was the first package I used (on a mainframe). It still comes up occassionally, but I’m not sure who is the key constituency for it anymore.

22

Barry 01.07.09 at 10:25 pm

Yes, I use it. Basically, non-statisticians, undergraduate statisticians, and several working statisticians that I know of.

23

Walt 01.07.09 at 10:29 pm

MH, the same was true of Cobol, and yet Cobol is well on its way to disappearing. New users who have a choice between SAS and its competitors will generally choose its competitors. Unless SAS Inc. finds a way to reverse this trend, then in the long run they are in trouble.

24

sg 01.07.09 at 10:52 pm

I don’t think that’s necessarily true Walt. To the best of my knowledge the implementation of GEEs in R is pretty crappy (at least it was last time I checked) and its ARIMA functions are annoyingly incomplete. I don’t know much about the topic, but I’m not under the impression that its odbc support or sql is not at all good. Plus of course the help is crap, and full of nasty rude people misunderstanding your question and being rude. I think these kinds of problems come up with open source software rather a lot.

Eszter, hundreds of thousands of observations aren’t the challenge. I have 10s of millions, and stata can’t handle any data set it can’t load. This makes a lot of trouble. For ANOVA-type problems this isn’t a big deal, but for something like a random effects model or a survival analysis it’s impossible. It’s really annoying when you have millions of observations but you essentially have to sample from them so viciously that they might as well not exist, because your software can’t handle the size.

What would be really nice would be a stats package you can switch between vector-style analysis for small data sets, and SAS-style chunk-by-chunk stuff for large data sets. Which I suppose I have, since I use both…

25

Watson Aname 01.07.09 at 11:09 pm

The typical solution to that problem, sg, (or at least very similar ones, I don’t do quite the same style of data analysis that you do) is that you end up writing stuff in a general programming language rather than a software “system”. Common enough for what I’d call large problems, those that won’t fit on one machine/node anyway. Of course, you lose a lot of simplicity and support here, hopefully you can find a language with the right tools that you don’t have to reinvent too many wheels.

This is essentially what Sam’s APL suggestion boils down to, but it’s a pretty obscure language these days, so for many there are much better choices, I expect. I typically write things in one even older, but I understand why many colleages don’t.

26

Eszter Hargittai 01.07.09 at 11:33 pm

sg, I can see the problem there, definitely.

MH, tons of undergraduates and even graduate students use SPSS (not to mention professors in some of the fields with which I’m affiliated). My impression is also that people at marketing firms and such use SPSS often.

27

nick s 01.08.09 at 12:07 am

It still comes up occassionally, but I’m not sure who is the key constituency for it anymore.

From what I can tell, it’s aimed at social science undergrads (and postgrads) who need a little bit of point-and-click statistical crunching to season their work, but not enough to make the leap to Stata/R worthwhile. It appears to be fairly entrenched among people who want basic analysis — and has been marketed and licensed in ways to ensure that it still has a place in the academic environment.

28

C. Hall 01.08.09 at 12:29 am

A truly scary quote in the quoted article:

“We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

I’ll interpret “freeware” as “free software,” as I doubt she was using the term in the original sense (free but not open source). This is a horribly thought out comment, in the same vein as thinking closed source software voting is a good idea. What would you prefer?: flying on a plane running code 10 people had read and tested, or flying on a plane that had had thousands of eyes read and testing the code? It’s a clear answer to me.

29

mds 01.08.09 at 1:01 am

Entirely separate from the whole “open source” issue, I suspect that aircraft engine manufacturers long relied upon custom in-house software, and probably still do to some extent. I doubt that the switch to SAS was driven by too many instances of aircraft falling out of the sky because engineers weren’t smart enough to do their own calculations.

30

paul 01.08.09 at 1:08 am

Any company that refers to itself as an “Institute” is bound to be prone to fits of stuffy huffiness.

31

Righteous Bubba 01.08.09 at 1:25 am

What would you prefer?: flying on a plane running code 10 people had read and tested, or flying on a plane that had had thousands of eyes read and testing the code?

There’s a guarantee in either case?

32

Barry 01.08.09 at 1:48 am

mds, the idea of ‘many eyes’ isn’t because people are stupid; it’s because many eyes tend to spot errors. I just spent an afternoon fixing something that I had done carefully, in methods I had set up to minimize errors, then checked, and then was checked by a co-worker who’s very careful. Then our boss checked it,because he was presenting to upper management, and didn’t like embarrassment.

Somebody in the 100 people to whom it was sent spotted the errors.

33

Barry 01.08.09 at 1:54 am

Walt 01.07.09 at 10:29 pm

” MH, the same was true of Cobol, and yet Cobol is well on its way to disappearing. New users who have a choice between SAS and its competitors will generally choose its competitors. Unless SAS Inc. finds a way to reverse this trend, then in the long run they are in trouble.”

If I were in SAS, what would worry me upon reading that article is the mention of R’s use by corporations. It’s not a data processing language, but combined with databases and reporting software, that’d pretty much be a SAS replacer .

BTW, for an individual outside of academia, the basic start-up cost of SAS would have been $1,500/year/user – back in the 1990’s, when I last new the retail price. And that would have limited functionality. This means that small businesses would lean hard away from SAS, unless they really, really needed to work in that particular package.

34

Walt 01.08.09 at 2:03 am

sg: Eventually they won’t suck, and eventually RAM will get so big that 99% of users will have datasets that R or Stata can load. At that point the jig is up.

SAS has the worst programming language known to man, and has a fantastically steep learning curve. These will eventually prove fatal, just similar matters proved fatal to Cobol.

35

MH 01.08.09 at 2:16 am

“SAS … has a fantastically steep learning curve.”

We don’t have something like the ABA to limit the new comers. There has to be something to keep just anybody who can get into grad school from driving down wages.

36

vivian 01.08.09 at 2:20 am

Just yesterday a colleague announced that he made a really cool simulation where Excel called out to R code, efficiently and seamlessly. Until that tale I was expecting R to be as unstable as, say, ten-years-ago linux, but if it cooperates with Office, it must be pretty robust, to say the least.

SAS isn’t going away because it has the commercial data-mining market captivated (if not captive). It handles enormous datasets easily, and is backward compatible. But things that take one line in Stata can take several paragraphs worth of SAS code. (And, like Eszter, I’ve never used either language’s GUI.)

37

Watson Aname 01.08.09 at 2:49 am

Out of curiosity, for anyone whose work overlaps with both domains: does SAS hold a similar place in the market to the one that Matlab does? Sounds a bit like it.

38

MH 01.08.09 at 2:58 am

SAS is a pain for some things and most models are more easily done in Stata (for me at least). But, if you need to combine data from different sources and to aggregate across records, Stata becomes very awkward very quickly. I use Stata frequently, but 99% of the time, its on datasets that were built in SAS.

SAS’s various attempts to make a user friendly interface have all been massively unworkable, even worse than finding the information you need in their documentation. Though SAS 6 did have a workable poker game on it. On the other hand, Stata’s pull-down analysis menus are actually a good way to learn about some feature that is new to you. Particularly if you learned in SPSS and like that format.

39

Caldem 01.08.09 at 6:10 am

I use Stata for datasets up to about 10 million obs no problem. You just need a server that can handle it. I have a 64 bit machine with 128 gigs of memory and I have no issues (you need the expensive Stata to -64 bit and 8 proccessors)

40

sg 01.08.09 at 8:07 am

watson, in my experience sas doesn’t have any real overlap with Matlab, which is a matrix programming language and has limited (but nice) statistical functionality.

Also, regarding writing your own code for statistics functions – that way lies madness. There is an enormous amount of work behind even the R code for a linear model, and reproducing that yourself is a huge amount of work and carries huge risk of error. Also, if you have to do a diverse range of tasks, every new task carries a lot of startup costs. If that’s what APL requires then I think it has no use for me.

Another aspect of SAS I think is often overlooked is its voluminous online help, which is impossible to search (very correct MH) but extremely detailed in the mathematics.

I don’t use SPSS routinely but there is one aspect of SPSS which I really like – the sample survey functions. In SPSS you generate a sample plan file, which you have to open before you can do any analysis on complex survey data. This is great if you have a lot of people working on the same survey across an organisation, who know how to do basic statistics in a GUI but don’t understand the survey stats. One expert generates the plan file and then everyone in the organisation can be told to use it. It reduces the risk that one of your psychology grads will use frequency weights on their survey data, or use the wrong set of probability weights, or miss out the finite sample correction, etc. so you can farm out the simple tasks without having to teach the complex code.

I also agree with Walt at 34 but stata and R have programming languages that are much less intuitive than SAS, IMO. Particularly R. This will slow down its general acceptance, I feel.

41

Watson Aname 01.08.09 at 9:04 am

sg, you misunderstood me, so I was probably unclear. I know how R/SAS etc. differ from Matlab and the like. My point was that Matlab is a commercial product with a pretty strong industry userbase plus some parts of academia. It’s well supported but not cheap, and you can get pretty much all of its core functionality elsewhere, and perhaps better for your tasks. It also has syntax very nice for some tasks, and pretty lousy as a general language. However, it has momentum and some nice packages for doing particular things. Loads of practicing engineers use it for particular tasks, and probably aren’t interested in learning something else for the same thing. So I don’t see it going away any time soon even though a lot of researchers get frustrated with it and many use other approaches (python + libraries is popular with many these days) instead. There are also free clones (e.g. octave) free non-clones (e.g. scilab) and all sorts other things living in a similar space. Matlabs real strengths are not in it statistical capabilities, though as you note the stats. toolbox is nice for what it does, so I wasn’t meaning to compare them that way at all.

I’m less familiar with precisely what SAS offers, but I was curious if it had a similar relationship with other offerings in terms of what bits of the market it is strong in, and why.

As far as writing your own goes, obviously nobody can practically write everything from scratch (or would want too). However, there are a large number of very good numerical libraries out there (free and otherwise) and depending on your problems, tying them together with some of your own stuff may be the best approach. This is the most typical approach in some problem areas, by far.

As for APL is a very concise array based language originating in the late 50s. There are (more) modern variants around but I know few people who use it anymore, and couldn’t tell you what that status of statistical libraries etc. was. Easy enough to find out, I suppose.

42

Watson Aname 01.08.09 at 9:07 am

I think after all that, my question was still unclear. Sorry, I’m tired.

Does SAS, like Matlab, have a core base of users in industry (and less so academia) that are unlikely to bother learning a replacement because a) they know it and b) it does most of what they want, even if a bit painfully?

43

Chris Williams 01.08.09 at 9:17 am

Being a historian, I don’t know much about stats for grown-ups. If you can’t do it on Excel, I can’t do it. So much of the above is noise to me. But this line:

“SAS isn’t going away because it has the commercial data-mining market captivated (if not captive). ”

stands out. I think it’s true: big software companies with a product, unlike open-source communities, can engage in the lobbying process and sell themselves as data-mining solution-providers, to the public and private sectors. And they are doing so – the market that I know a little bit about is crime prevention, and big software is into that one.

44

Zamfir 01.08.09 at 9:27 am

“We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

I am happy inform you that while I do not know about the engines, I do know that wings are designed using open source software.

45

Bill Gardner 01.08.09 at 10:46 am

You can write SQL in SAS. This is a great convenience if what you need to do involves both complex database operations and statistics.

46

Barry 01.08.09 at 11:35 am

@ 41, 42 – No, SAS is not like Matlab in that respect; it’s far, far huger. SAS is really a data processing system with a statistical analysis system. A large corporation/gov’t agency could do massive data processing with SAS, while only a few people use SAS for any statistical use beyond descr. stats and frequency tables.

Back in the 1990’s, one pharmaceutical company (Merck?) went from SAS to S-plus, and that was big news; they’re locked into SAS.

47

Bill Gardner 01.08.09 at 1:43 pm

To elaborate a bit… In my opinion, SAS is more cumbersome than R or Stata for statistics. The database professionals I know prefer Oracle, MySQL, or SQL Server to SAS. However, SAS is light years ahead of Stata or R as a database and light years ahead of Oracle or SQL Server as a tool for analyzing a table of data. There are many advantages to doing both kinds of work in one programming space.

48

SamChevre 01.08.09 at 2:08 pm

Sorry, sg–I originally thought your question was a joke.

I think it’s been answered; APL is an array-based programming language dating back to the 1950’s, with its good points and its bad. I don’t know about statistical libraries, but there’s semi-publicly-available code to do almost anything data-related.

If you want a plug-and-play statistical system, APL is NOT what you want.

And a question in turn to Watson Aname–what’s older than APL?

49

Kieran Healy 01.08.09 at 2:42 pm

We’ve spoken of APL before here at CT. With code examples, even.

50

Barry 01.08.09 at 3:04 pm

It depends on the scale. Ideally, the data goes into databases to start, (possibly pre-processed by SAS), with proper data entry forms, scanning, validation, etc. Then, one could extract the data using report-writing software (e.g., Crystal Reports), either add hoc or pre-set batch jobs. SAS has modules to pull data from databases as if they were native SAS, from the viewpoint of the SAS user. At that point, a SAS user can ‘post-process’ in SAS for the desired analyses and reports.

Depending on the set-up and one’s position, one might be using SAS-processed data, or SAS-generated reports, or SAS-based input screens, without knowing that SAS stands for anything other than some Brit SF guys, or a shoe company in San Antonio.

In my background, SAS is usually used as the data processing/analysis software for research projects (engineering, social science and medical). It’s convenient to use one package for a lot of stuff. The trick is data entry and validation; I’ve seen a lot of people figure that incorrectly-entered data will be ‘taken care of later’, possibly using Statistical Methodology :(

51

Barry 01.08.09 at 3:10 pm

24 sg 01.07.09 at 10:52 pm

“I don’t think that’s necessarily true Walt. To the best of my knowledge the implementation of GEEs in R is pretty crappy (at least it was last time I checked) and its ARIMA functions are annoyingly incomplete. I don’t know much about the topic, but I’m not under the impression that its odbc support or sql is not at all good. ”

The advantage of open source software with a reasonably large user base is that these problems will probably be taken care of soon.

“Plus of course the help is crap, and full of nasty rude people misunderstanding your question and being rude. I think these kinds of problems come up with open source software rather a lot.”

That’s a slower problem. I expect an expanded GUI (perhaps using R Commander as a base); there are also lots and lots of books on R.

52

Watson Aname 01.08.09 at 5:23 pm

And a question in turn to Watson Aname—what’s older than APL?

For various things, I use both Fortran (for libraries) and Lisp (for flexibility, metaprogramming). Of course, “older” is a bit of a misnomer in all cases, as it’s not like we’re using the original variants of any of these.

Barry @ 46: Thanks, I knew it was used in industry, but had no real feel for the userbase.

53

Steve 01.08.09 at 5:49 pm

C. Hall @28 wrote “What would you prefer?: flying on a plane running code 10 people had read and tested, or flying on a plane that had had thousands of eyes read and testing the code?

In the latter case, are you distinguishing between the number of eyes that could have read and tested the code and the number of eyes that actually read and tested the code? As an example, the Linux kernel is freely available as source code to the entire world. Anybody and their brother can theoretically read it. Now how many people in the world do you think have read the entirety of the source code for the Linux kernel?

In addition, the quantity of people reading the source code isn’t the only factor in how many bugs are found — quality matters too. For instance, you could (theoretically) get my mother, who knows just about enough about computers to open a web browser (Sending email, though? Forget it.) to read the Linux kernel source code. Would she understand it or be able to find a bug? Unless there was a comment that said THE NEXT LINE CONTAINS A BUG, probably not.

Now I’m not saying that the first scenario you described is better than the second or vice versa. I’m just saying that the idea that more eyes automatically result in more bugs found and/or better quality code is too much of a simplification.

54

Barry 01.08.09 at 6:54 pm

You’re welcome, Watson. As a statistician, it feels like somebody asked if MS Office is used much in the corporate world :)

55

Watson Aname 01.08.09 at 7:00 pm

Just doing my part to keep up the ivory tower stereotypes, Barry!

56

sg 01.08.09 at 11:39 pm

Watson, I think the others have answered your question. As an example of what SAS is used for, the New South Wales Health Department (now NSWHealth) built a huge data extraction and analysis system for health professionals, called HOIST, mostly in SAS. I use SAS for partially legacy reasons, but also because it’s a very convenient way to implement sql for large datasets and do statistical analysis on those large sets.

I think there is a large body of data miners, and anyone who works with statistical analysis on large datasets, who use SAS.

Also, I have been waiting for a decent implementation of GEEs in R for at least 3-5 years, so I’m not so sure it’s coming soon. It must be something that the R community don’t need. In my experience, if you want to do a decent log-linear model with clustering and/or serial dependence, SAS is the easiest way to put it together. And writing that by yourself is a lot of work. I don’t just do plug and play stats, but I am not crazy enough to pretend I can do a better job of writing a whole GEE analysis package by myself – that’s what we have smart people for!

57

Barry 01.09.09 at 12:07 am

Yes, sg, it’s sort of a race. SAS had the lead and the massive set of developers, but R has been gaining developers. I knew that it was becoming the language of stat/biostat departments, but I hadn’t known that corporations were using it. That is what makes me feel that it’ll be offering SAS a run for the money.

58

Adam 01.09.09 at 3:51 am

In physics, the trend has been moving to ROOT, an open source analysis system developed at CERN (see root.cern.ch). ROOT is based in C++, and most new collider physics experiments are using ROOT as the underlying foundation for their custom made analysis framework. We have insanely large datasets, and ROOT seems to do well. In addition, new, powerful features are continually being added.

59

derrida derider 01.09.09 at 4:57 am

Yep. If you creating and processing large datasets, there are better tools than SAS. If you are doing serious statistics on fairly simple and not unduly large datasets, there are better tools than SAS. But if you have very large and complex datasets that were originally collected for transactional rather than analytic purposes, and that therefore need extensive manipulation as part of the analysis, SAS is what you want.

This last is uncommon in academia, but common in government and big business.

60

Bill Gardner 01.09.09 at 7:01 am

The Times article was also discussed by Andrew Gelman, who says “I just hate SAS. It’s not just a matter of it having poor capabilities; SAS also makes people into worse statisticians, I think.” He explains in the comments, “SAS spews out pages and pages of output for any analysis. The output isn’t easy to post-process; as a result, people stare at the output and pick out numbers. R more easily allows graphical and other postprocessing of inferences.” He’s right about post-processing, at least. The R list (similar to a Ruby or Python hash) is a more straightforward way to access results than ODS. R beats Stata here too.

61

sg 01.09.09 at 8:28 am

aaaaaaah! ODS! I hate it! I also love R’s method for handling output (a few idiosyncracies aside) and would love to be able to use that in every package.

I don’t know if SAS makes people worse statisticians though. The numbers you “pick out” are just the ones you know to extract using functions in R. Statisticians are fond of blaming computer packages for making people “worse statisticians”, but in my experience the people they are thinking of didn’t know what they were doing anyway. For example, the use of frequency weights instead of probability weights in sample survey analyses, a mistake I have caught many people doing, arises because they just didn’t understand survey sampling to start with. Any software which gives them a choice of weights is fraught with this risk, regardless of how it handles output.

I also don’t think SAS has poor capabilities. A lot of people work very hard to make sure SAS is up to date, and for example it was well in front with GLMs, GEEs and time series analysis. It does response surface modelling, smoothing, and now it has mixed model GLMs attached. I think the problem is that the archaic interface and language are becoming increasingly cumbersome as the modern world passes them by. Andrew Gelman is probably complaining because SAS doesn’t do his part of stats (Bayesian modelling and HLMs, right?) so well, but that’s because the people it’s aimed at don’t need them. And anyway, if you are using SAS on a data set of millions, you are unlikely to be doing HLMs anyway unless you have direct access to God.

62

Bill Gardner 01.09.09 at 3:27 pm

Statisticians are fond of blaming computer packages for making people “worse statisticians”, but in my experience the people they are thinking of didn’t know what they were doing anyway.

Just so. And there is often handwringing about packages like SPSS enabling the heathens to analyze data. I’ve always felt that such concerns were misplaced, because the heathens would otherwise be making decisions without any acquaintance with the data.

you are unlikely to be doing HLMs anyway unless you have direct access to God

I know the deity.interface group has stuff in progress, but it isn’t on CRAN yet.

63

Steve Polilli 01.09.09 at 7:31 pm

In our best tradition of stuffy huffiness, I first note that I am in the employ of SAS marketing. Just wanted to let you know that we aren’t the open source haters that we appear to be in the NYT article. We DO run on Linux. Would love to have you check out the response by Anne Milley, who was so unfortunately quoted in the Times, to the passionate responses by R and open sources proponents. See http://blogs.sas.com/sascom/

Comments on this entry are closed.