Lately, I’ve seen these words a lot: “big data ecosystem”, “hadoop”, “spark”, “mapreduce”, “hive”, “pig” etc.
[Read More]
Have some frequent use code templates?
Snippets can help!
I’ve been having this question for so long: is there some ways I can save my own code templates?
[Read More]
How to measure the value of an added independent variable in binary logistic regression models?
About AUC, peudo-R2, likelihood ratio test
We often predict the probability of occurence of binary outcomes: e.g. cases or controls using logistic regressions.
[Read More]
Notes on linear model improvement
special focus on Ridge and Lasso
Although there are many nonlinear “fancy” models, we know linear models still have advantages in inference and works well in real world.
[Read More]
How to merge many files into a big dataframe in R?
save lots of typing
At work, sometimes we need to combine and analysis information from lots of output files together.
[Read More]