The authors discuss recommendations for improving methodological consistency and reliable results.

**Reference**

- Ialongo C, Bernardini S. Preanalytical investigations of phlebotomy: methodological aspects, pitfalls and recommendations. Biochem Med (Zagreb) 2017;27:177-191.

]]>

Faber et al. (1) searched MEDLINE for meta-analyses that had been published during 2013 and that included non-randomized studies. Two reviewers then assessed the characteristics and key methodological components in these publications.

Of the initially selected 188 papers, 119 included both randomized and non-randomized intervention studies, and 69 only non-randomized intervention studies. Assessments of bias was reported in 135 papers (72%), but this evaluation referred to confounding bias only in 33 papers (18%). In 130 papers (69%) the design of the non-randomized intervention study was not clearly specified, and it was unclear in 131 papers (70%) if crude or adjusted estimates were used.

The authors conclude that some important methodological aspects of the systematic review process are not adequately reported in meta-analyses that include non-randomized intervention studies.

**Reference**

- Faber T, Ravaud P, Riveros C, Perrodeau C, Dechartres A. Meta-analyses including non-randomized studies of therapeutic interventions: a methodological review. BMC Medical Research Methodology 2016:35 DOI: 10.1186/s12874-016-0136-0

]]>

Niven et al. (1) wished to investigate how many of published, peer-reviewed matched case-control studies that were analysed using appropriate statistical methodology. They identified and reviewed 37 matched case-control studies. Of these, 16 (43%), were adequately analysed. Studies with adequate analysis had more often than other studies cases with cancer and heart disease, 10/16 (63%) versus 5/21 (24%) and more often multiple controls , 14/16 (88%) versus 13/21 (62%). They were also more often published in a high impact journal.

The authors conclude that it their study raises concern that a majority of matched case-control studies present findings that are based on inadequate statistical analyses.

**Reference**

- Niven DJ, Berthiaume LR, Fick GH, Laupland KB. Matched case-control studies: a review

of reported statistical methodology. Clinical Epidemiology 2012:4;99–110.

]]>

Most of these models were developed in Europe (n=167, 46%) and predicted risk of coronary heart disease (n=118, 33%) over a 10 year period (n=209, 58%). Common predictors were smoking (n=325, 90%) and age (n=321, 88%). Most of the models were sex specific (n=250, 69%).

The authors found substantial heterogeneity in predictor and outcome definitions, and important information was often missing. For 49 models (13%) the prediction horizon was not specified, and for 92 (25%) crucial information for enabling the model to be used for individual risk prediction was missing.

No more than 132 models (36%) were externally validated and only 70 (19%) by independent investigators. The model performance was heterogeneous and discrimination and calibration were only reported for 65% and 58% of the external validations, respectively.

The authors conclude that there is an excess of models predicting cardiovascular disease in the general population, and that the usefulness of most of these is unclear because of methodological shortcomings, incomplete presentation, lack of external validation, and lack of model impact studies. Future work should primarily focus on external validation and comparisons of already existing models.

**Reference**

- Damen JAAG, Hooft L, Schuit E, et al. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ 2016;353:i2416.

]]>

Of the reviewed papers 16,700 included tests of statistical significance in a format that it could check. Half of these reported at least one erroneous p-value. One in eight contained at least one erroneous p-value, which may have affected the conclusion of the paper. The prevalence of erroneous p-values was stable over time or declined, but was higher in p-values reported as significant than in p-values reported as non-significant.

The authors suggest that data sharing, letting co-authors check results, and checking manuscripts using statcheck could reduce the number of such reporting errors.

**Reference**

- Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM. The prevalence of statistical reporting errors in psychology (1985–2013). Behav Res 2015 DOI: 10.3758/s13428-015-0664-2.
- Epskamp, S. & Nuijten, M. B. (2016). statcheck: Extract statistics from articles and recompute p values. Retrieved from http://CRAN.R-project.org/package=statcheck. (R package version 1.2.2).

]]>

Kahan and Morris (1) performed a review of the British Medical Journal, Journal of the American Medical Association, Lancet, and the New England Journal of Medicine with respect to randomised trials published in 2010. The purpose was to see if the method of randomisation was adequately reported, how often balancing was used and whether the balancing factors were adjusted for in the statistical analysis.

The results were that balanced randomisation was common. While the randomisation method was unclear in 37% of the 258 published reports, 63% included balancing on at least one factor. A majority of the trials with balanced randomisation were inadequately analysed. In only 26% of them included the statistical analysis adjustment for all balancing factors. The trials in which the statistical analysis did not include adjustment for balancing factors were less likely to show a statistically significant result, 57% versus 78%.

Kahan and Morris conclude that balancing is common but often poorly understood.

**Reference**

- Kahan BC, Morris TP. Reporting and analysis of trials using stratified randomisation in leading medical journals: review and reanalysis. BMJ 2012;345:e5840.

]]>

Lombardi and Hurlbert (1) reviewed the frequency of one-tailed testing in the 1989 and 2005 volumes of Animal Behaviour and Oecologia. They found one-tailed testing in 24% of the relevant articles in Animal Behaviour and in 13% of Oecologia articles. One-tailed testing were used more often with non-parametric hypotheses than with parametric and twice as often in 1989 as in 2005.

The authors refer to the criterion that one-tailed tests should only be used when a societal (not individual) interest results in a null hypothesis having just one direction, and they claim that according to this criterion all the uses of one-tailed tests in the two reviewed journals were invalid.

The conclusion of the investigation is that “One-tailed tests rarely should be used for basic or applied research in ecology, animal behaviour or any other science.”

**Reference**

1. Lombardi CM, Hurlbert SH. Misprescription and misuse of one-tailed tests. Austral Ecology (2009) 34, 447–468.

]]>

Valle et al. performed a systematic review. PubMed was searched using different combinations of the search terms ‘malaria’, ‘logistic’, ‘models’, ‘regression’, ‘diagnosis’, and ‘diagnostic’. Of 36 studies that satisfied the criteria, 70 % did not address the issue of the imperfect detection in malaria outcome.

The authors interprets their results as suggesting that malaria epidemiologists are generally unaware of the consequences that imperfect detection can have on parameter estimates from logistic regression. The authors also recommends using Bayesian models instead of logistic regression.

**Reference**

Valle D, Tucker Lima JM, Millar J, Amratia P, Haque U. Bias in logistic regression due to imperfect diagnostic test results and practical correction approaches. Malaria Journal 2015;14:434

]]>

In 68 of the 70 papers the statistical analysis involved homogeneity tests across all groups, using an ANOVA or similar distribution-free test. Comparison between specific groups were not performed when the null hypothesis of homogeneity across all groups could not be rejected, but followed invariably when this hypothesis was rejected. The comparisons were then done using several different tests. The authors stress the importance of distinguishing between planned and unplanned comparisons, and that it is important for researchers also to consider what constitutes a biologically interesting effect. They recommend avoiding test procedures that include all pairwise comparisons when only a small subset of them are of interest.

The authors conclude from their survey that statistical testing of comparisons among more than two groups remain common in behavioral science, but that the common practice is variable and almost always suboptimal.

**Reference**

- Ruxton GD, Beauchamp G. Time for some a priori thinking about post hoc testing. Behavioral Ecology 2008;19:690-693.

]]>

In 83 (28%) articles, the authors selected covariates for multivariable models based on prior knowledge. Stepwise selection procedures, extensively criticized in modern literature, were used in 59 (20%) articles. Not a single article presented use of shrinkage methods such as LASSO, and as many as 105 (35%) publications did not describe the method, which the authors see as an indication of low quality of the information presented in the methods sections.

The authors conclude that variable selection methods which have been formally criticized as flawed still prevail in the scientific literature.

**Reference**

- Walter S, Tiemeier H. Variable selection: current practice in epidemiological studies. Eur J Epidemiol 2009;24:733–736.

]]>