Wednesday, June 6, 2012

Website for regression analysis

This website was extremely helpful to me doing my regression analysis.  Hope it is helpful to you!

http://www.montefiore.ulg.ac.be/~kvansteen/GBIO0009-1/ac20092010/Class8/Using%20R%20for%20linear%20regression.pdf

Recoding Variables

This post is a combined effort of Megan H, Denise M, and Steven B. 

In case you are in the middle of recoding your data here are some tips and an example from our paper and syntax.  First when you recode the data you need to find a way to make your independent variables coded the same.  As seen below we decided to take a number of different survey questions and code them 0-2.  This gave us a chance to categorize the people into categories despite the questions asking different things.  In our example there are three different types of questions being asked but we were able re-code them so we could measure and compare the variables with one another.  This is at the discretion of the researcher but when you do this you should explain why you coded it as you did.   This is our example from our paper. 


In our study we have decided to code all of our independent variables on a 0-2 scale. 0 codes as a non- gamer, 1 as a moderate gamer, and 2 as an extreme gamer.  We decided these measurements were best to get measurable and meaningful results for our research.  We have five independent variables for our study that all have to do with playing computer or internet games. 

First, Xbox live, was measured, originally in the survey each respondent was asked their status of X-box live with the following options, Never Used; Previous User; Currently Active.  We decided to code never used as a zero (non-gamer), previously used as a 1 (moderate gamer), and currently active as a 2 (Extreme Gamer)

Students were asked the same question about World of Warcraft was asked, measured, and coded in the exact same format of x-box live. 
Students were asked in general if they played computer games or not on the survey given.  We coded those who do not as a 0.  We coded those who do as a 2.

Student who originally completed the survey was asked how often they played Facebook Games, the possible responses were  Hourly; Several times a day; Once a day; Several times a week; Once a week; Rarely; Never.  We coded this on the 0-2 scale as well, Never and non-applicable were coded as 0, and, rarely, once a week, and several times a week were all coded as a 1,  Once a day, several times a day, and hourly were all coded a 2.

Students were also asked how frequently they played Internet games in general.   This question had the same possible responses as Facebook games and we coded it the same as Facebook games.

This was our syntax for recoding, Check your syntax for what numbers were originally coded in order to re-code.  If your codebook is not clear you can run summaries and histograms of the variables to try to find out what the code is.  

 S1.SNS.XboxLive<-recode(S1.SNS.XboxLive, "1=0; 2=1; 3=2")
S1.SNS.WoW<-recode(S1.SNS.WoW, "1=0; 2=1; 3=2")
S1.OUT.GameCon<-recode(S1.OUT.GameCon, "NA=0; 7=2")
S1.CU.Games<-recode(S1.CU.Games, "NA=0; 2=2")
S1.FBU.Game<-recode(S1.FBU.Game, "7=0; NA=0; 6=1; 5=1; 4=1; 3=2; 2=2; 1=2")
S1.IU.Games<-recode(S1.IU.Games, "7=0; NA=0; 6=1; 5=1; 4=1; 3=2; 2=2; 1=2")

Check that your re-codes are accurate when you are finished recoding by running histograms of the variables to make sure your recodes were accurate.  This helped us spot multiple mistakes we made before our re-codes were finally done correctly

Correlation Tables in R flagged with significance level stars (*, **, and ***)

If you want to create a lower triangle correlation matrix  which is flagged with stars (*, **, and ***) according to levels of statistical significance, this syntax may be helpful (found it here). All you have to do is cut and paste into R and insert your data table. You will need the Hmisc and xtable packages.

corstarsl <- function(x){ 
require(Hmisc) 
x <- as.matrix(x) 
R <- rcorr(x)$r 
p <- rcorr(x)$P 

## define notions for significance levels; spacing is important.
mystars <- ifelse(p < .001, "***", ifelse(p < .01, "** ", ifelse(p < .05, "* ", " ")))

## trunctuate the matrix that holds the correlations to two decimal
R <- format(round(cbind(rep(-1.11, ncol(x)), R), 2))[,-1] 

## build a new matrix that includes the correlations with their apropriate stars 
Rnew <- matrix(paste(R, mystars, sep=""), ncol=ncol(x)) 
diag(Rnew) <- paste(diag(R), " ", sep="") 
rownames(Rnew) <- colnames(x) 
colnames(Rnew) <- paste(colnames(x), "", sep="") 

## remove upper triangle
Rnew <- as.matrix(Rnew)
Rnew[upper.tri(Rnew, diag = TRUE)] <- ""
Rnew <- as.data.frame(Rnew) 

## remove last column and return the matrix (which is now a data frame)
Rnew <- cbind(Rnew[1:length(Rnew)-1])
return(Rnew) 
}

##Create table _insert your dataframe below
New_table<-corstarsl(yourdataframe)




## exporting tables to either html or .tex (I prefer .tex but you will have to install TeX)

print.xtable(newtable, type="latex", file="filename.tex")
print.xtable(newtable, type="html", file="filename.html") ## see here for formatting tips

Monday, June 4, 2012

Comparison of Data Analysis Packages

This is a link to an interesting page I found that compares different statistical packages...

http://brenocon.com/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/

For me, reading this made me grateful for being exposed to R, but also learning other programs as well.  To me, it largely depends on what you're trying to do that makes one program better to use than another.

Matched Sets of Graphs in R

Sometimes you may want to see grpahs side by side in R.  To accomplish this you can use the function

par(mfcol=c(2,4))

You can change the numbers within this function depending on how mnay graphs you want to appear together and which ones you want next to each other.  The first number specifies the number of rows of graphs that will appear, in this case 2.  The second number specifies the number of graphs that will appear in each row, in this case 4.
Hi guys,

 Just some quick input on how to do Poisson regression which is a form of regression when you have a count variable. It is very simialr to using binomial regression except you use the code

summary(regress<-glm("Y-variable"~"X-variable1"+"Xvariable2"+... +"X-variableLAST", family=poisson)

Make sure you have the program "car" uploaded and that should work.


This should give you your quartiles, your coefficients, standard error and significence along with your null and residual deviance to calculate the pseudo R^2

hope that helps!

Almost done!



Spring quarter of 450/550 is almost done! 

Things to do:


  1. Make sure that you have your participation all squared away.   This means: 
    1. double check that you watched at least a couple of Khan academy videos on relevant topics (while logged in to the account where you selected me as a coach). 
      1. Right now Shanique and Nathan have plenty of Khan academy views on their accounts.
      2. But other folks who   
    2. Make sure that you created your two videos.
    3. Make sure that you made at least one helpful blog post. 
    4. I have record from class attendance and my subjective sense of in-class participation. 
    5. Make sure too that you have tagged your contributions to
      1. helpful links page
      2. crash course in statistics
      3. anywhere else?
  2. Take a look at the updated turn in form for the HW.
    1. Just turn in one project for your group. 
    2. Any questions?
  3. I will be in the lab starting at Mon (2:00) and Tues (10:00).
    1. You are encouraged to show me your progress.