Sunday, October 1, 2017

Simulation on Statistical Significance and Power

"Are the effects of A and B different? They are always different---for some decimal place." Tukey, 1991, quoted in (Cohen 1994)

"The continued very extensive use of significance tests is alarming." (Cox 1986)

"Small wonder that students have trouble [with statistical hypothesis testing]. They may be trying to think." (Deming 1975)

I put this together this morning as an aid for understanding significance and power via simulation and visualization.  This first iteration is around a two-sample t-test.

P-values CAN be useful, but use with caution.

I would appreciate any additional ideas on how to improve the code and what it is trying to convey.

The full code is on rpubs:

Sunday, July 23, 2017

Exploration of Text-Mining Packages Using HST's 'Hey Rube' Columns

Well, time for another post on text-mining.  I figured it was time to see how the quanteda and tidytext packages could supplement what I've been doing with tm and qdap. Here is the link to the Markdown document on RPubs.

Saturday, May 13, 2017

Mastering Machine Learning with R, 2nd Edition

The second edition to my book on machine learning with R is now available on

In this edition, I've added new data sets, and methods such as XgBoost, Sequential Analysis, and Multivariate Adaptive Regression Splines, which is quickly becoming my favorite technique for a number of reasons I discuss in the book.


Monday, March 13, 2017

Plotting Vietnam Airstrikes with Leaflet and R

No event in American history is more misunderstood than the Vietnam War. It was misreported then, and it is mis-remembered now. Richard M. Nixon

The war in Vietnam was not lost in the field, nor was it lost on the front pages of the New York Times or the college campuses. It was lost in Washington, D.C. H.R. McMaster


On 17 February, I attended the very moving and emotional Memorial Service at the National Infantry Museum for LTG(R) Harold G. Moore, the co-author of "We Were Soldiers Once and Young", which chronicles their experience of fighting North Vietnamese Regulars on Landing Zone (LZ) Xray.  The book has become almost required reading for any Army Officer and one cannot underestimate how important the book has become to professional development.  The movie is fine, but doesn't do the real story justice, even though Mel Gibson portrayed the intrepid Hal Moore.  An overview of the desperate struggle is available here: Several days later, I was informed of the data on airstrikes available from World War 1 through the Vietnam War on  The data is also available on  It is a treasure trove of data and was made available to support open-source analysis.

Since airpower played an integral part in saving the 1st Bn, 7th Cavalry from being overrun, I decided to explore those strikes using the Leaflet package in R. I created a subset of the data to include only those days 1/7 CAV was on the ground, 14 - 16 November, 1965.  The full code and interactive map are on RPubs:

Here is the full code as well:


df = read.csv("vietnam_nov_65.csv")


# If the same mission conducts multiple attacks on the same
# lat/long, it generates a separate observation. Therefore, I have
# chosen to dedupe on mission id number (MISSIONID)
df <- distinct(df, MISSIONID, .keep_all = T)

# US Air Force = blue
# US Marine Corps = dark red
# US Navy = light green
df$color <- ifelse(df$MILSERVICE == "USAF", "blue", ifelse(
  df$MILSERVICE == "USMC", "darkred", "lightgreen"

icons <- awesomeIcons(
  icon = "air",
  iconColor = "black",
  library = "glyphicon",
  markerColor = df$color # Branch of Service
  #text = df$MISSIONID

leaflet(df) %>%
  addProviderTiles("Esri.WorldImagery", group = "Image") %>%
  addProviderTiles("Stamen.Terrain", group = "Terrain") %>%

    baseGroups = c("Image", "Terrain"),
    overlayGroups = df$MSNDATE,
    options = layersControlOptions(collapsed = FALSE))  %>%

  addAwesomeMarkers(icon = icons,
             label = ~ df$popup,
             popup = ~ df$popup,
             group = paste(df$MSNDATE))