Sunday, July 23, 2017

Exploration of Text-Mining Packages Using HST's 'Hey Rube' Columns

Well, time for another post on text-mining.  I figured it was time to see how the quanteda and tidytext packages could supplement what I've been doing with tm and qdap. Here is the link to the Markdown document on RPubs.

Saturday, May 13, 2017

Mastering Machine Learning with R, 2nd Edition

The second edition to my book on machine learning with R is now available on

In this edition, I've added new data sets, and methods such as XgBoost, Sequential Analysis, and Multivariate Adaptive Regression Splines, which is quickly becoming my favorite technique for a number of reasons I discuss in the book.


Monday, March 13, 2017

Plotting Vietnam Airstrikes with Leaflet and R

No event in American history is more misunderstood than the Vietnam War. It was misreported then, and it is mis-remembered now. Richard M. Nixon

The war in Vietnam was not lost in the field, nor was it lost on the front pages of the New York Times or the college campuses. It was lost in Washington, D.C. H.R. McMaster


On 17 February, I attended the very moving and emotional Memorial Service at the National Infantry Museum for LTG(R) Harold G. Moore, the co-author of "We Were Soldiers Once and Young", which chronicles their experience of fighting North Vietnamese Regulars on Landing Zone (LZ) Xray.  The book has become almost required reading for any Army Officer and one cannot underestimate how important the book has become to professional development.  The movie is fine, but doesn't do the real story justice, even though Mel Gibson portrayed the intrepid Hal Moore.  An overview of the desperate struggle is available here: Several days later, I was informed of the data on airstrikes available from World War 1 through the Vietnam War on  The data is also available on  It is a treasure trove of data and was made available to support open-source analysis.

Since airpower played an integral part in saving the 1st Bn, 7th Cavalry from being overrun, I decided to explore those strikes using the Leaflet package in R. I created a subset of the data to include only those days 1/7 CAV was on the ground, 14 - 16 November, 1965.  The full code and interactive map are on RPubs:

Here is the full code as well:


df = read.csv("vietnam_nov_65.csv")


# If the same mission conducts multiple attacks on the same
# lat/long, it generates a separate observation. Therefore, I have
# chosen to dedupe on mission id number (MISSIONID)
df <- distinct(df, MISSIONID, .keep_all = T)

# US Air Force = blue
# US Marine Corps = dark red
# US Navy = light green
df$color <- ifelse(df$MILSERVICE == "USAF", "blue", ifelse(
  df$MILSERVICE == "USMC", "darkred", "lightgreen"

icons <- awesomeIcons(
  icon = "air",
  iconColor = "black",
  library = "glyphicon",
  markerColor = df$color # Branch of Service
  #text = df$MISSIONID

leaflet(df) %>%
  addProviderTiles("Esri.WorldImagery", group = "Image") %>%
  addProviderTiles("Stamen.Terrain", group = "Terrain") %>%

    baseGroups = c("Image", "Terrain"),
    overlayGroups = df$MSNDATE,
    options = layersControlOptions(collapsed = FALSE))  %>%

  addAwesomeMarkers(icon = icons,
             label = ~ df$popup,
             popup = ~ df$popup,
             group = paste(df$MSNDATE))