R and Stata are comparable for the tests I performed because both packages have to load all the data in at once before performing any analysis. This difference is huge when we think in percentages: R was 57% faster than Stata.
Or to put differently, the Stata Corporation package was 2.78 times slower than the free one. That is, R took roughly 2/3 of a minute to perform its duty, while Stata did so in roughly 2 minutes. While Stata took 118.35 seconds, R took only 42.53 seconds. Therefore, Stata exported the data 8% faster than R did.įinally, exporting data from memory to disk but as their native format, R outperformed Stata in few dozens seconds again. While Stata took 67.25 seconds for writing a file of 458MB of raw text, R needed 5.7 seconds more to do the same (72.93 seconds). How about exporting data already in the memory to the disk? When it comes to exporting back data from memory to the disk as a text delimited, Stata finally outperformed R. This mean that R loaded the dataset 78% faster than Stata, or to put differently, my Stata/MP 13 took 4.6 times more than R to load the “same data”. Overall, R did quite well loading its file in 19.23 seconds, while Stata did so in 89.66 seconds. But how about testing for difference in loading their native formats? Once again, I tested how quickly they load the same data but converted to their own file formats (.Rdata and.
Reading raw text into memory can be tricky, since each software may have different strategies for loading different sort of data. Stata importing output: R importing output: Therefore, in this simple but critical task, R outperformed Stata by loading a pure text file 24% faster than did Stata.
However, R took much less time to import the same file, only 102.49 seconds. As the following output shows, Stata took 134.37 seconds for reading this raw data. I tested how quickly these packages get through a semicolon delimited text file of about 450MB. Of course, a multicore version of this package doesn’t help much, since parallel computation provides benefits only for completing repetitive tasks that take at least one or two seconds to get through (there are quite a few posts about this topic, including my own here ).Īny simple work starts by feeding the statistical package with raw data. However, the evidence I got today contradicts my previous opinion about Stata. Interesting though, my past experience as Stata and R user made me believe that Stata was much faster than R for performing trivial tasks, including loading data tables into memory. The results I obtained, surprised me-I’ve to confess-R outperformed Stata in most of the data assignments I ran.Īlthough the Stata version I’m using is a multicore one-not the basic and cheaper inter-cooled version (Stata/IC)-the results I obtained go in a negative direction for Stata. Essentially, reading and writing raw datasets. Much more tests will come in the following weeks, but today I focused only on the basics: processing text files.
| 3 c not mutually exclusive not mutually exclusive |ġ0.Today, I got a license of the new Stata/MP 13 (dual core), so I decided to make some succinct comparisons with R (Rstudio).
| 3 a not mutually exclusive not mutually exclusive |ĩ. | 2 c not mutually exclusive not mutually exclusive |Ĩ. | 2 b not mutually exclusive not mutually exclusive |ħ. | 2 a not mutually exclusive not mutually exclusive |Ħ. not mutually exclusive not mutually exclusive |ĥ. & OK & exposure = "c"īysort id (exposure OK) : replace wanted = wanted & OK & exposure = "b"īy OK id: replace wanted = 4 if wanted =. & OK & exposure = "a"īy OK id: replace wanted = 3 if wanted =.
To install: ssc install dataexīy OK id: gen wanted = 1 if OK & exposure != exposureīy OK id: replace wanted = 2 if wanted =. Note that Stata doesn't attach any special meaning to ".". Naturally you can break that down into shorter statements: bysort id (exposure) : gen wanted = 1 if exposure != exposureīy id: replace wanted = 2 if exposure = "a"īy id: replace wanted = 3 if exposure = "b"ĮDIT Here is some technique more complicated set-ups.
Otherwise assign missing - there shouldn't be any such according to your example, but checking is a good idea. If there are different values within an id, assign 1Īssign 2 if the first value is a (equivalently all values are a)Īssign 3 if the first value is b (equivalently all values are b) Label def groups 1 "not mutually exclusive", modifyīysort id (exposure) : gen wanted = cond(exposure != exposure, 1, cond(exposure = "a", 2, cond(exposure = "b", 3.