New in Stata 18 are
powerful statistical analyses, customizable visualizations, simple data manipulation, and repeatable automated reporting—all in a complete package.
Take your research further with the latest features in Stata 18.
stata mp v18 new version crack download
Main updates
Bayesian model averaging (BMA) ( Bayesian model averaging for linear regression )
Traditionally, we select a model and make inferences and predictions conditioned on that model. Our results generally do not account for uncertainty in the choice of model and therefore may be overly optimistic. If the models we choose differ significantly from true data generating models (DGMs), they may even be incorrect. In some applications we may have strong theoretical or empirical evidence about DGM. In other applications, often of a complex and unstable nature, such as those in economics, psychology, and epidemiology.
Rather than relying on just one model, model averaging averages the results of multiple plausible models based on the observed data. In BMA, the “plausibility” of a model is described by the posterior model probability (PMP), which is determined using basic Bayes’ principle (Bayes’ theorem) and is universally applied to all data analysis.
BMA can be used to account for model uncertainty when estimating model parameters and predicting new observations to avoid overly optimistic conclusions. The new command bmaregress performs a BMA of linear regression and can be used for inference, prediction, and even model selection if needed. For example:
. bmaregress y x1 x2
Consider all four possible models for the outcome y, including or excluding the predictors x1 and x2, and combine these models based on the likelihood of each model based on the observed data. You can choose from a variety of prior distributions to explore the impact of assumptions about the model and predictor importance on the results.
Postesmation commands allow you to estimate model probabilities, identify important predictors, explore model complexity, obtain predictive means, evaluate predictive performance, and make inferences about regression coefficients.
Causal mediation analysis
The purpose of causal inference is to identify and quantify the causal effect of a treatment on an outcome. In a causal analysis, we aimed to further explore how this effect arises. Perhaps exercise increases levels of a hormone that, in turn, improves well-being. Perhaps import quotas increase the market power of local firms, which in turn increases the price of goods. We often use cause and effect diagrams to show such relationships, for example

Using the new mediate command, we can estimate the total effect of a treatment on an outcome and decompose it into direct effects and indirect effects (through mediators such as hormone levels). In fact, several types of decompositions can be calculated, depending on the assumptions of interest. Additionally, estat proportion reports the proportion of the total effect that occurs through the mediator. Mediates are very flexible – results can be continuous, binary, or count; mediators can be continuous, binary, or count; and treatment results can be binary, multivalued, or continuous. The mediate command is very flexible, and it supports 24 model combinations of outcomes and mediators, so it can be applied to many situations that arise in real research.
Heterogeneous DID (Heterogeneous Difference in Difference (DID) )
The DID model was used to estimate the average treatment effect among subjects treated (ATET) from repeated measures data.
Treatment effects could be the effects of medication on blood pressure or the effects of a training program on employment. Unlike the standard cross-sectional analysis provided by the existing teffects command, the DID analysis controls for group and time effects in estimating ATET, where groups are measured repeatedly.
Heterogeneous DID models also take into account changes in treatment effects due to groups receiving treatment at different time points, as well as changes in effects over time within groups.
Suppose several schools introduce an exercise and nutrition program to improve students’ health. Is it reasonable that the impact of the program on student health outcomes does not vary over time and is the same regardless of when the program is adopted? Maybe not. We can use a heterogeneous DID model to explain potential effect differences.
New commands hdidregress and xthdidregress are available for heterogeneous DID models. hdidregress is suitable for repeated cross-sectional data and xthdidregress is suitable for longitudinal/panel data.
Tables of descriptive statistics
When you publish your work, you will usually include a descriptive statistics table, often called “Table 1”; this provides your readers with some information about your sample. For example, you might want to display some demographic data, such as average age and average income. You can also compare these characteristics across different groups, such as regions or occupational fields.
In Stata 18 you can use dtable to create these and many other variations of “Table 1” and export them to a variety of formats. For example, we can create a table and export it to Excel:

Table 1 – Excel
Or we can create a table and export to HTML

Table 1-HTML
word

Table 1-Word
or PDF

Table 1 – PDF
Additionally, since dtable is built on the collect command suite, which is designed for customizing any type of table, you can further customize the table’s appearance using the collect command after creating the table with dtable.
Group sequential designs ( group sequential designs of clinical trials )
GSDs are adaptive designs that allow researchers to stop trials early if they find strong evidence that a treatment is or is not effective.
Suppose we want to design a study to test whether a certain type of chemotherapy is effective in treating tumors, and we want to collect data over the course of several months. GSDs allow us to conduct interim analyzes while data is being collected, rather than once after all data have been collected. Each interim analysis provides the opportunity to stop the trial or continue collecting data. Trials can be stopped early if there is strong evidence of efficacy. Trials can also be stopped early if there is strong evidence that they are ineffective; this avoids exposing more participants to inappropriate treatments.
Stata 18 provides a set of commands for GSDs. The new gsbounds command can calculate efficacy and utility bounds based on the number of analyzes (also called looks), the expected overall Type I error, and the expected power. The new Gsdesign command calculates efficacy and futility margins and provides sample sizes for interim and final analyzes to test means, proportions, and survival functions. Graphs makes it easier to visualize the boundaries of all interim and final analyses.
This tool is very user friendly. The syntax command follows our direct syntax for the power command. Results are easily available via a point-and-click interface. Sample size calculations can be extended beyond tests of means, proportions, and survival functions because the user can specify a user-defined method, which is readily available through gsdesign.
This feature will be of interest to anyone designing clinical trials, and this could be extended to clinical psychologists and other medical researchers.
Robust inference for linear models
Reliable standard errors are critical to drawing appropriate inferences in research. Stata 18 provides new methods for obtaining standard errors and confidence intervals for fitting linear models of regress, areg and xtreg, fe. The purpose of the new method is to provide better inference when large sample approximations do not work. Maybe you have cluster data for only a few clusters, or you have an uneven number of observations per cluster. You can now add the vce(hc2-clustervar) option to get hc2 cluster-robust standard errors. Maybe you have multiple variables to identify clusters in your data. It is now possible to add the vce(cluster-clustervar1-clustervar2…) option to get multi-way cluster standard error.
Wild cluster bootstrap ( wild cluster bootstrap method )
WCB proposed by Cameron, Gelbach, and Miller (2008) provides an alternative to clustering—it provides robust variance estimates when the number of clusters is small or the number of observations across clusters is uneven.
When we fit models with clustered observations, we often use cluster-robust variance estimators, which relax the assumption of independence of observations within each cluster. This estimator works well if we have many clusters and the number of observations for these clusters does not differ much. However, if this is not the case, we might get a better estimate using WCB.
Stata’s new wildbootstrap command estimates WCB p-values and confidence intervals (CI) for testing simple and compound linearity hypotheses about linear regression model parameters. Obtained when fitting linear regression models (e.g. models fitted with regress), models with large sets of indicator variables (e.g. models fitted with areg), and fixed effects models (e.g. models fitted with xtreg, fe) These statistics.
Flexible demand systems ( estimation of flexible demand systems )
Researchers estimate the demand for a basket of goods. The new demandsys command provides a wide range of tools to calculate demand and measure how sensitive demand for a good is to changes in price and spending by calculating the corresponding elasticity. We can use demandsys to accommodate eight different demand system models:
• Cobb–Douglas
• Linear expenditure system
• Basic translog
• Generalized translog
•Almost ideal demand
• Generalized almost ideal
• Quadratic almost ideal
• Generalized quadratic almost ideal
Using the estat elasticities command, we can estimate various elasticities—expenditure elasticities, uncompensated own-price and cross-price elasticities, and compensated own-price and cross-price elasticities—to explore how sensitive demand is to changes in price and expenditure. With eight demand systems to choose from, the demandsys command provides researchers with great flexibility in selecting demand system techniques that are consistent with their empirical assumptions.
TVCs with interval-censored Cox model ( TVC with interval-censored Cox model )
In event-time data, interval censoring occurs when the time of an event of interest (such as cancer recurrence) is not directly observed but is known to lie within an interval. The existing stintcox command fits semiparametric interval-censored Cox proportional hazards models. In Stata 18, stintcox allows time-varying covariates.
Interncox now supports review data for multiple records per subject interval, including records for each subject at each examination time. This format can be easily adapted to time-varying covariates; the data record the values of the covariates at each examination time. Multiple recordings per subject also provide a convenient way to specify current status data.
stingcox also provides new options tvc(varlist_t) and texp(exp), which provide a convenient way to include time iteration covariates that are specified in tvc() with texp () is formed by the interaction between the time uncertainty functions specified in ().
After fitting a model, standard and special post-evaluation functions of interest are available, taking into account time-varying covariates appropriately. You can use the new estat gofplot command to produce a goodness-of-fit plot. You can predict relative risk. You can use stcurve to plot survivors and related functions. When you have multiple record data, you can use the new stcurve option attmeans to evaluate a function of the time-specific means of the covariates, or the new option atframe(framename) to evaluate a function of the variable values specified in framename.
True semiparametric models modeled interval-censored event-time data until recent methodological advances implemented in the stintcox command. Methodological advances are also reflected in the extensions to time-varying covariates, which are now available in this command.
GOF plots for survival models ( goodness-of-fit plots of survival models )
Stata 18 provides the new estat gofplot command to generate goodness-of-fit (GOF) plots of survival models. You can use it after four survival models: right-censored Cox (stcox), interval-censored Cox (stintcox), right-censored parameter (streg), and interval-censored parameter (streg). Check model fit after hierarchical modeling or separately for each grouping.
GOF plots provide a visual check of how well the model fits the data. In survival analysis, these checks are based on so-called Cox–Snell residuals and the assumption that, if the model is correct, these residuals should have a standard exponential distribution. Visually, this hypothesis is assessed by plotting the residuals against the estimated cumulative risk—the closer the plotted value is to the 45° line, the better the fit (Cox and Snell 1968).
• Parametric and semi-parametric survival models
• Right-censored and interval-censored data
• Three estimators of the cumulative hazard function
• By group and hierarchical models
Lasso for Cox model ( Lasso for Cox proportional hazards model )
We use Lasso for prediction and model selection when we have many potential covariates (and by many, I mean hundreds, thousands, or more). We previously introduced the lasso command to perform lasso analysis of linear, logit, probit and Poisson models. New in Stata 18 is the lasso of Cox proportional hazard models. lasso cox can be used to select covariates using lasso and fit a Cox model to survival time data. elasticnet cox can similarly be used to select covariates and fit a Cox model using an elastic net.
You can use predict to predict hazard ratios after lasso cox and elasticnet cox; use stcurve to plot survivor, hazard, or cumulative hazard functions; or use any of the other post-estimation tools available after lasso and elasticnet to examine lasso results.
Multilevel meta- analysis
When researchers want to analyze results from multiple studies, they use meta-analysis to combine the results and estimate the overall effect size. Existing meta-suites were used to perform standard and multivariate meta-analyses.
Sometimes reported effect sizes are nested within higher-level groupings, such as geographic location (state or country) or administrative unit (school district). Effect sizes within the same group (e.g., zone) are likely to be similar and therefore dependent. In this case, you can use multilevel meta-analysis. The goal of multilevel meta-analyses is not only to synthesize the overall effect size, but also to take this dependence into account and assess changes between effect sizes at different levels. New estimation commands meta meregress and meta multilevel are used to perform multilevel meta-analyses.
Suppose we have research reporting the effects of two teaching methods on mathematics test scores y and the sampling standard errors (mean differences) of y. Effect sizes are nested within schools, and schools are nested within districts.
we can use
.meta meregress y || district: || school:, essevariable(se) or
. meta multilevel y, relevels(district school) essevariable(se)
If we have covariates and want to include random slopes, we can use meta-regression:
. meta meregress y x1 x2 || district: x1 x2 || school:, essevariable(se)
After fitting a model, the postestimation command can be used to calculate multilevel heterogeneity statistics, display the estimated random effects covariance matrix, and more. The Syntax command is by far the simplest of all packages. meta meregress is also the most flexible in terms of the constraints that can be applied to random effects.
Meta-analysis for prevalence ( meta-analysis of prevalence )
The meta esize command performs meta analysis on two samples of binary or continuous data. It now also performs meta-analysis on single-sample binary data, also known as proportion meta-analysis or prevalence meta-analysis. These types of data typically appear in meta-analysis studies when pooling each study to estimate a proportion of the results. For example, you might have studies reporting the prevalence of a specific disease or the proportion of high school dropouts. In this setting, effect sizes, such as Freeman–Tukey transformed proportions or logit transformed proportions, are commonly used in meta-analyses.
After meta esize, use other commands in the meta suite for further analysis. For example, use meta forestplot to create a forest plot, perform subgroup analysis by adding the subgroup() option to meta forestplot, use meta summarize to summarize meta-analysis data, or use meta funnelplot to build a funnel plot.
Prevalence meta-analysis has been one of the most common features requested by users to be added to meta-analysis suites.
Local projections for IRFs ( local projection of IRFS )
The new Ipirf command provides partial projection of IRFS. In time series analysis, local projection methods are used to estimate the impact of shocks on outcome variables. For example, we can assess the impact of unexpected changes in interest rates on a country’s output and inflation rates.
You can enter:
.lpirf y1 y2
To obtain local projection estimates of the IRFS of y1 and y2. You can add the exog() option to estimate the dynamic multiplier, which is the response of endogenous variables to shocks in exogenous variables.
The new lpirf command works seamlessly with the existing irf command, allowing you to create graphs and tables of IRFS, orthogonal IRFS, and dynamic multipliers.
As with the linear models mentioned above, robust standard errors tend to be important in IRF estimation. Robust standard errors and Newey-West standard errors are available.
Local projection of IRFS provides an alternative to IRFS based on vector autoregressive (VAR) models. Local projections are not model constrained; therefore, they provide more flexible IRF coefficients. Local projections also allow for easier hypothesis testing.
Model selection for ARIMA and ARFIMA ( model selection for ARIMA and ARFIMA )
Use the new arimasoc and arfimasoc commands to select the optimal number of autoregressive and moving average terms.
Researchers using autoregressive moving average (ARMA) models must decide on the appropriate number of lags to include the autoregressive and moving average parameters in their models. Balanced model fit Information criteria for model parsimony usually guide the choice of the maximum number of lags.
arimasoc and arfimasoc assist in model selection by fitting a set of autoregressive integrated moving average (ARIMA) or autoregressive fractional integrated moving average (ARFIMA) models and calculating information criteria for each model. arimasoc and arfimasoc compute the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Hannan–Quinn Information Criterion (HQIC). The selected model is the one with the lowest information criterion value.
Relative excess risk due to interaction (RERI )
Epidemiologists often need to determine how two exposures interact to place a subject at a higher risk of experiencing the outcome of interest. For example, you may want to investigate how smoking and asbestos interact to increase the risk of cancer. Using the new reri command, you can measure two-way interactions in an additive model of relative risk, taking into account other risk factors.
Researchers can choose from a variety of supported models such as logistic, binomial generalized linear, Poisson, negative binomial, Cox, parametric survival, interval censored parametric survival, and interval censored Cox models. They can evaluate additive models of smoke and asbestos interactions by using three relevant statistics: RERI, decomposability ratio, and synergy index.
New spline functions ( spline function generation )
Typically, we don’t want to make functional assumptions about the data we analyze. We may wish to regress the results on a set of regressors and be agnostic about the functional form of the regressors. Spline basis functions are flexible approximations to the functional form of the regressor. We may also want to visualize the relationship between results and regressors or between variables. We can use splines to visualize this relationship without declaring linear or other functional forms.
In Stata 18, you can use the new makesspline command to generate B-splines, piecewise polynomial splines, and restricted cubic spline basis functions from an existing list of variables. For example, we can enter
. makesspline bspline x1 x2 x3 x4 …x100
Form 100 third-order B-spline basis functions, one for each variable from x1 to x100. We can now fit the model using any basis function and be agnostic about the relationship between the covariates and the outcome of interest. Or we can visualize the relationship between the results of interest and any basis function components generated by makespline.
Corrected and consistent AICs ( Corrected AIC and consistent AIC )
By popular request, the existing estat ic and estimates stats commands now support two new model selection criteria: modified Akaike information criterion (AICc) and consistent AIC (CAIC). The new option All displays all available information criteria. The new option df() specifies the degrees of freedom for computing information criteria.
Model selection is the foundation of any statistical analysis, and information criteria have been and remain some of the more common statistical techniques for model selection. In Stata, after any estimation command that reports log-likelihood (including most estimation commands), just type
. staticic, aiccorrected
or
. static, consistent
Calculate AICc or CAIC respectively.
To report on all four information criteria (AIC, BIC, AICc, and CAIC), type
. static, all
Sometimes, in analyzes such as linear mixed models, we need to manually specify the number of degrees of freedom or observations to use for criterion calculations. We can do this by specifying options n() and df():
. static ic, n(500) df(10) all
These same new criteria and options are also available for the estimates stats command.
IV fractional probit model ( instrumental variable fractional probability model )
Some results are common. You might be modeling 401(k) pension plan participation rates, standardized test pass rates, expense shares, and so on.
Fractional response models are a flexible and intuitive way to model outcomes between 0 and 1. They do not have the problem of linear models producing predictions outside of 0 and 1, nor the problem of undefined log-odds models at 0 and 1. 1. Fractional response models can be fitted using the fracreg command.
What if you are concerned that one or more model covariates are endogenous? Using the new ivfprobit command, you can fit a model for a fractional dependent variable and account for the endogeneity of one or more covariates.
IV quantile regression ( instrumental variable quantile regression )
We use quantile regression when we want to study the effect of a covariate on different quantiles of the outcome, rather than just on the mean. For example, we want to model the distribution of students’ grades and how it is affected by changes in covariates. The existing qreg command is suitable for quantile regression models, but what if we suspect that one of our covariates is endogenous? This endogeneity may arise due to self-selection of study participants, omission of relevant variables from the model, or measurement error. The new ivqregress command allows us to model quantiles of outcomes while using IVs to control for issues arising from endogeneity.
After fitting an IV quantile regression model, you can plot the coefficients across quantiles using the esta-coeffplot command. You can test for endogeneity using the esta-endogeeffects command. Using esta-dualci’s weak instrumental variables, you can estimate robust dual confidence intervals.
All-new graph style
Stata 18 includes the new graphics schemes stcolor, stcolor_alt, stgcolor, and stgcolor_alt.
stcolor is Stata’s new default scheme. It is based on the old default scheme s2color, but with the following modifications:
• These charts are 7.5 inches wide and 4.5 inches high.
• Color palette updated with brighter colors.
• The background color is white.
• The legend consists of one column and is located to the right of the chart.
• The y-axis labels are horizontal.
• Major grid lines are dashed lines.
• Marker size is small.
• Reference lines and lines added with the xline() and yline() options are black.
• Histogram fill color is stc1, intensity is 90%, outline color is stc1, intensity is 70%.
Scheme stcolor_alt is based on scheme stcolor but with the following modifications:
• Width and height are set to six inches and four inches.
• The legend consists of two columns and is located below the plot area.
Some graphics commands that use scheme stcolor_alt include tsline, tsrline, fcast graph, and estat acplot. If you prefer to use scheme stcolor or any other scheme for these commands, you can specify the scheme() option.
Graphics for Stata 18

Graphics drawn in previous versions

Graph colors by variable ( draw colors by variable )
In Stata 18, the new colorvar() option allows many bidirectional plots to change the color of markers, bars, etc. based on the value of a variable.
Alias variables across frames ( alias variables across frames )
Starting with Stata 16, Stata supports multiple datasets in memory. Each dataset is in a frame. When datasets are related, you can use the frlink command to link their frames and identify variables that match observations in the current frame and observations in related frames. In Stata 18, you can use the new fralias add command to create alias variables in a linked frame and perform analyzes using variables stored in separate frames.
Alias variables behave as if they were copied from one frame to another, but since they are stored in the original frame, they take up very little memory. To see how easy it is to use aliased variables, assume that y is a variable in the current frame and x is a variable in the linked frame named frame2. To create an alias for x in the current frame, enter
. fralias add x, from(frame2)
Enter the following command and you can fit a regression model
. regress yx
Just like x is stored in the current frame.
This feature is available for all subjects. This new feature will be of interest to anyone working with multiple data sets in memory.
Frame sets _
Stata 18 added a natural evolution of the frame concept: users can now save a group of frames (called a frameset) using frames save. The frameset can be restored in memory later using frames use. Framesets are automatically compressed when they are saved on disk. You can also save linked frames automatically.
Boost-based regular expressions ( Boost-based regular expressions )
Regular expressions are used for
• Data validation, for example, checking that a phone number is in the correct format;
• Data extraction, for example, extracting phone numbers from strings; and
• Data transformation, for example, normalizing different phone number inputs.
Stata provides two sets of regular expression functions: regexm(), regexr(), and regexs() based on byte streams; and ustrregexm(), ustrregexrf(), ustrregexra(), and ustrregexs() based on Unicode. Unicode-based regular expression functions are built on top of the ICU library.
In Stata 18, byte stream-based functions were updated to use the Boost library as the engine.
Vectorized numerical integration
Numerical integration is used in many integral calculations when analytical solutions are unavailable or difficult to compute. Vectorized numerical integrals simultaneously approximate the vectors of single-variable numerical integrals.
Mata’s new class QuadratureVec() is functionally the same as Quadrature(), except that it handles vectors for integration problems more conveniently. More precisely, QuadratureVec() numerically approximates a univariate integral vector via the adaptive Gauss–Kronrod method (the adaptive Simpson method is also provided for comparison).
QuadratureVec() is used in the same way as Quadrature(), requiring only four steps, namely creating an instance of class QuadratureVec(), specifying the evaluator function, setting limits and performing calculations.
New reporting features
Reproducible reporting allows us to streamline the process of presenting our findings as our analysis changes. Whether there is a change in the direction of our work or feedback from our implementing peers, creating a report with our findings is rarely a one-time task. Stata’s reproducible reporting capabilities allow us to easily modify and adjust our reports as our analysis changes.
In Stata 18, we added functionality to putdocx and putexcel that allows you to further customize reproducible reports. Now you can add headers, footers and page breaks using putexcel. You can also freeze a row or column in a worksheet; this allows you to keep the information in that row or column in view while scrolling through the rest of the worksheet. Additionally, you can create named cell ranges to simplify using formulas. We’ve also added support for bookmarks using putdocx; simply format your text as a bookmark and link to it as needed. Additionally, when adding images to a .docx file, you can now specify alt text for the image to be read by speech software.
• New features in putdocx
• Include bookmarks in paragraphs and tables
• Includes alternative text that can be read by image-to-speech software
• Includes scalable vector graphics (.svg) images
• new putexcel
• Freeze a worksheet at a specific row or column
• Insert page breaks at specific rows or columns
• Insert headers and footers in the worksheet
• Include hyperlinks in cells
• Create a named range of cells
Do-file Editor enhancements ( Do file editor improvements )
Stata’s Do-file editor now provides automatic backup and syntax highlighting of user-defined keywords.
Automatic backup . Documents opened in Do-file Editor are periodically saved to a backup file on disk. This includes new documents that have not yet been saved to disk. If your computer loses power or crashes before you have a chance to save changes to a document, you can still recover unsaved changes. To restore unsaved changes, open your document in Do-file Editor again. If the backup file is found in the same location as the document, you will be prompted to restore the backup file or open the document last saved to disk. Restoring a backup file only loads it into the Do-file Editor; it does not overwrite the document saved to disk unless you choose to do so.
Syntax highlighting for user-defined keywords . Stata’s Do-file Editor now includes syntax highlighting for user-defined keywords. This will allow you to syntax-highlight your favorite community-contributed commands. You simply create a specially named keyword definition file containing a list of keywords, and Stata will syntactically highlight these keywords using settable colors and font styles (such as bold or italic). You can even create a global keyword definition file that can be shared with all users on the same computer. Each user can still create their own local keyword definition files, and keywords in both global and local files will be loaded into the Do-file Editor.
Data Editor enhancements
The data editor can now pin rows and columns . Fixed rows or columns do not scroll with the rest of the data, so they remain in view as you scroll through the data. This is useful for visual comparison with some other data that may only be visible when you scroll. It’s natural that the ID variable is fixed.
Resizable cell editor in string data . When editing a string variable, you can resize the cell editor so that more of the string is visible while editing without scrolling out of the cell editor’s view.
Tooltip for truncated text . Any cell value that is too wide to fit within the column width of its cell will be truncated to fit. Hovering the mouse pointer over a cell with truncated text displays a tooltip with the untruncated cell value.
Supports proportional width fonts . The data editor now supports fixed-width fonts. This improves the readability of the data and allows more variables to be displayed at once without the need for scrolling. You can still use monospace fonts if you prefer.
Display variable labels in column headers . Variable labels can now appear directly below the variable name in column headers. This is useful for viewing data sets with short and non-descriptive variable names for variable labels.