Analyzing US Veteran Gravesite Data with R

For the final project in Prof. Jake Porway‘s Data Without Borders: Data Science in the Service of Humanity class at NYU-ITP, I chose to work with the US veterans and beneficiaries gravesite datasets that have been published at data.gov.  The Department of Veterans Affairs in 2004 started working on a nationwide gravesite locator that allowed for this data.

Gathering the Datasets

The data is unfortunately aggregated only at the state level, but it at least is updated regularly, so I ended up pulling 51 .csv (comma-separated values) files from the site with the publish date of October 2012.  The categories found in the data:

d_first_name, d_mid_name, d_last_name, d_suffix, d_birth_date, d_death_date, section_id, row_num, site_num, cem_name, cem_addr_one, cem_addr_two, city, state, zip, cem_url, cem_phone, relationship, v_first_name, v_mid_name, v_last_name, v_suffix, branch, rank, war

Since we would have to apply what we’d learned working with the R language to our dataset, what I hoped was that I could use the gravesite data, which goes back to the 1800’s or even earlier, to see how where veterans end up being buried correlates with national population trends over time.  In other words, if many Americans are buried in California, does this mean more veterans are also going to be buried there?

I figured, since there were categories for branch, rank, and war, that I’d be able to find some logical correlations: many privates and junior-enlisted sergeants would have died, while fewer senior-enlisted and officers would have, in past wars.  I figured the d_death_year might correlate with dates for the US’s multiple wars, with deaths elevated during those time periods.

So I guessed that with 51 data sets, this would begin to fill up my system RAM (4GB on this MacBook Air).  Look at my memory usage!

I skipped the files for the US-owned territories and “foreign addresses” since I wouldn’t be able to find normalized population data for those.  I also cleaned up the dataset so it would return only the veteran, not his/her beneficiaries who may also have been listed in the data as being deceased.

Loading the Data into R

Given that I’m not very comfortable with R, I started out just loading the Washington DC dataset since it only has 986 entries in it.  Problem?  The download for the file didn’t work.  “ngl_washington%20dc.csv” was not found.  %20 is a URL-encoded representation for a blank space.  Luckily, getting rid of the %20 revealed the proper filename, “ngl_washingtondc.csv”.  I also found that the .csv files were not importing into RStudio immediately.  I’d get an error.  I would get something similar to an uneven rows error.  What I had to do for each of the states’ .csv files was to open them up first in Excel and then save them in Excel.  Excel would properly format the files so they could be imported into RStudio.

I tried to write my R code in such a way that it could easily handle the 51 states + DC via functions, but I ended up having 51 calls for each state.  I wish the datasets were integrated into one national file.  I also wish I knew in R how to make a variable variable.  So that if I wanted to pass “Texas” to a row name, I could do “t <- “Texas” and then “state_data${t}” or something similar to convert on the fly to “state_data$Texas”.  In PHP, you might do this with $stateData{$t} (I think) and in JavaScript or Python you’d use eval().  Not pretty but I didn’t know how to do it properly.

The next step was to break up d_death_date (which was in a variety of formats such as “1993”, “9/3/98”, and “07/11/1864”) so that I could extract the year.  I had to check for the number of characters in the string, then figure out if the year was 2 digits or 4.  If it was 4 digits, I knew the year for sure (e.g. “2008”).  If 2 digits, I figured that if it were less than “15”, then it was probably referring to the year 2000 and higher (lazy data entry).  If higher than “15”, it probably was assumed to be the 20th century.  Finally, I had to convert this result from a string result to a numeric so I could do math on it.

More below the jump…

Alright, that was done.  Next I had to go pull up state-by-state population data going back to the 1800’s.  I figured this would be easy, but it was not.  I needed something like US census data going back to before 1800.  And it needed to be in a tabular format or something I could import into R easily.  The only thing I could find was a fixed-width text file version of census data from 1790 to 1990 on the New Jersey Department of Labor and Workforce Development web site.  How crazy is that?  For the year 2000 (and I also found 2010), I pulled in data from other sources for all the states.

To use this data, I had to paste it specially into Excel, selecting fixed-width instead of comma-separated.  I had to manually set the column widths so that the data would go into the appropriate columns.  Not very clean, but it worked!

Data Analysis

Okay, now I could begin comparing some data.  Very frustrating up till this point, and I wondered if I had bitten off more than I could chew in the time given (a few weeks).

I created a stats() function to run on each state, compiling different stats for each 50-year block.  I calculated the number of deaths per period but found that would not be very useful unless I did a per capita (dividing it by that year’s state population), so I did that too.  I also calculated the number of certain ranks per time period.  This was fraught with error too because my grepping for “CPL” (Corporal) would also count “LCPL” or Lance Corporal, a Marine rank.  But I figured the numbers didn’t require too much specificity unless I did deeper work on these datasets or found interesting conclusions from the results.

I also calculated the number of Army, Air Force, Marine Corps, and Navy deaths in 50-year blocks, as well as deaths for the periods of war.  The results of all this are below:

Total entries in mass dataset (all 51 files combined): 6,427,126

States sorted by most deaths in 2000 (using: rev(sort(table(bs$state[bs$death.year==2000]))) ):

Top 5: TX (18560), CA (17318), NY (15771), OH (13208), FL (11581)

TX CA NY OH FL PA IL MA MI NC MO TN MN GA VA AL
18560 17318 15771 13208 11581 11447 10202 9422 9271 8670 7996 7854 6592 6556 6366 5984

NJ KY IN WI OK LA MD SC AR MS CT OR WA WV IA AZ
5842 5776 5619 5426 5390 4730 4635 4453 4377 4060 4057 4021 3831 3795 3472 3377

CO KS ME NM NE NV NH RI SD ND MT HI ID UT DE VT
3206 2977 2238 1975 1686 1669 1576 1355 1270 1217 1217 1134 1086 1039 909 806

WY AK DC
416 269 30

States sorted by most deaths in 1950 (using: rev(sort(table(bs$state[bs$death.year==1950]))) ):

Top 5: CA, NY, MN, MO, NJ

CA NY MN MO NJ TX MD TN KY KS HI IL VA WI OH FL AR LA GA
3495 2846 710 679 674 642 583 564 331 289 267 253 245 226 223 195 188 158 152
IN NC AL OK MS OR NM SD PA WV CO WA MA SC IA MT AZ MI NE
130 110 92 90 90 85 72 69 65 60 60 44 42 38 38 37 36 29 24
ME AK CT VT UT NH WY RI ND ID DE DC
10 10 7 5 5 4 2 2 1 1 1 1

States sorted by most deaths in 1900 (using: rev(sort(table(bs$state[bs$death.year==1900]))) ):

Top 5: CA, OH, NY, VA, KS

CA OH NY VA KS WI GA IN IL AL TN TX ME KY PA MO MD NC AR
1123 320 303 237 198 152 138 132 104 100 94 93 89 81 73 72 67 66 62
LA SC MS NM FL MI WV IA CO NJ OK MN OR WA RI MT SD AK UT
51 46 46 41 29 25 21 17 12 11 10 10 9 8 8 5 4 4 3
NE MA DC VT
3 3 2 1

States sorted by most deaths in 1850 (using: rev(sort(table(bs$state[bs$death.year==1850]))) ):

Top 5: NY, PA, GA, OH, TN

NY PA GA OH TN DC IN WV VA KY AL WI MO IL NJ AR TX NC MS MN MI LA KS FL
26 14 12 11 6 6 5 4 4 4 4 3 3 3 2 2 1 1 1 1 1 1 1 1

States sorted by most deaths in 1800 (using: rev(sort(table(bs$state[bs$death.year==1800]))) ):

Top 5: NY, PA, NC, VA, OH

NY PA NC VA OH NJ MA TX SC NH KY GA DC CT
5 3 3 2 2 2 2 1 1 1 1 1 1 1

Veteran deaths in given year / US population in given year:

  • 2000: 265734 / 281421906 = 0.0009443
  • 1950: 13980 / 150,697,361 = 0.00009277
  • 1900: 3873 / 76,212,168 = 0.00005082
  • 1850: 117 / 23,191,876 = 0.0000050449
  • 1800: 26 / 5,308,483 = 0.000004898

Total Deaths by Rank:

  • SGT: 957556
  • PFC: 813700 (using: length(grep(“PFC”, bs$rank)) )
  • PVT: 783822
  • CPL: 519318
  • SSG: 153950
  • MSG: 97652
  • SFC: 66966
  • COL: 60584
  • MAJ: 33219
  • LTC: 19106
  • LCPL: 16882
  • 1LT: 15591
  • PV2: 14454
  • CPT: 8071
  • SPC: 7998
  • 1SG: 7898
  • 2LT: 7013
  • GYSGT: 5354
  • SGM: 4875
  • CSM: 3231
  • MGYSGT: 1012

Deaths during periods of war (but not necessarily as a result of the war):

  • Civil War,  nrow(bs[bs$death.year >= 1861 & bs$death.year <= 1865,]):  143480
  • World War I, nrow(bs[bs$death.year >= 1917 & bs$death.year <= 1920,]): 4023
  • World War II, nrow(bs[bs$death.year >= 1940 & bs$death.year <= 1945,]): 97366
  • Korean War, nrow(bs[bs$death.year >= 1950 & bs$death.year <= 1954,]): 76204
  • Vietnam War, nrow(bs[bs$death.year >= 1964 & bs$death.year <= 1975,]): 375148
  • Persian Gulf War, nrow(bs[bs$death.year >= 1990 & bs$death.year <= 1991,]): 104683
  • OIF, nrow(bs[bs$death.year >= 2003 & bs$death.year <= 2011,]): 2412103
  • OEF, nrow(bs[bs$death.year >= 2001 & bs$death.year <= 2013,]): 3104646

Deaths per Capita by State, 2000

Top: WV, ND, ME, SD, AR

West.Virginia North.Dakota Maine South.Dakota Arkansas
deaths.per.capita.2000 0.002040526 0.001800549 0.001678827 0.001549232 0.001495782
Massachusetts Oklahoma Mississippi Missouri Kentucky
deaths.per.capita.2000 0.001436358 0.001431652 0.001363221 0.001330122 0.001327631
Rhode.Island Vermont Alabama Minnesota Tennessee
deaths.per.capita.2000 0.00128406 0.001278681 0.001245893 0.001240292 0.001231917
Montana New.Hampshire Ohio Iowa Connecticut
deaths.per.capita.2000 0.001223834 0.001192634 0.001141722 0.001136949 0.001132725
Kansas Louisiana Delaware South.Carolina New.Mexico
deaths.per.capita.2000 0.001039523 0.001038656 0.001009017 0.000958464 0.0009553649
Wisconsin Michigan Nebraska North.Carolina Pennsylvania
deaths.per.capita.2000 0.0009522255 0.0009353662 0.0009203936 0.0009063557 0.0008988681
Indiana Hawaii New.York Maryland Illinois
deaths.per.capita.2000 0.0008642512 0.0008296375 0.0008120568 0.000800528 0.0007930425
Virginia Texas Wyoming Idaho Georgia
deaths.per.capita.2000 0.0007920141 0.0007345137 0.0007320077 0.0006901816 0.000673961
New.Jersey Colorado Nevada Florida Washington
deaths.per.capita.2000 0.0006632982 0.0006354895 0.0006159963 0.0006127263 0.0005672724
Arizona California Utah Alaska
deaths.per.capita.2000 0.0005266113 0.0004637675 0.0003749867 0.0003728225
District.of.Columbia
deaths.per.capita.2000 4.985683e-05

Deaths per Capita by State, 1950

Top: HI, NY, MN, MO, MD

Hawaii New.York Minnesota Missouri Maryland
deaths.per.capita.1950 0.0002203812 0.0001499753 0.0001443242 0.0001213538 0.000110073
Kansas California Tennessee South.Dakota Kentucky
deaths.per.capita.1950 0.0001074982 0.0001031836 9.913376e-05 9.140962e-05 8.189483e-05
New.Jersey Arkansas Wisconsin Montana New.Mexico
deaths.per.capita.1950 8.010126e-05 7.032244e-05 4.213529e-05 4.101109e-05 3.958119e-05
Louisiana Virginia West.Virginia Mississippi Texas
deaths.per.capita.1950 3.535486e-05 3.461178e-05 3.317953e-05 3.163825e-05 3.078868e-05
Oklahoma Indiana Alabama Illinois Ohio
deaths.per.capita.1950 2.608201e-05 2.137987e-05 2.068764e-05 2.037153e-05 1.964214e-05
Georgia Alaska Nebraska Colorado North.Carolina
deaths.per.capita.1950 1.856726e-05 1.595069e-05 1.402473e-05 1.39494e-05 1.366576e-05
Iowa Florida South.Carolina Vermont Maine
deaths.per.capita.1950 1.298558e-05 1.220094e-05 9.471557e-06 8.212514e-06 7.843611e-06
Washington Arizona Massachusetts Pennsylvania Wyoming
deaths.per.capita.1950 7.465066e-06 7.016679e-06 6.615114e-06 5.292705e-06 4.05037e-06
New.Hampshire Michigan Utah Connecticut Rhode.Island
deaths.per.capita.1950 3.236806e-06 2.917962e-06 2.238971e-06 2.055459e-06 1.907816e-06
District.of.Columbia North.Dakota Delaware Idaho Nevada
deaths.per.capita.1950 1.748071e-06 1.557147e-06 1.276161e-06 7.728256e-07 0

Deaths per Capita by State, 1900

Top: KS, ME, VA, CA, WI

Kansas Maine Virginia California Wisconsin
deaths.per.capita.1900 7.991689e-05 7.247982e-05 3.830391e-05 3.773519e-05 3.10726e-05
Ohio New.Mexico Arkansas Alabama Indiana
deaths.per.capita.1900 2.950093e-05 2.706147e-05 2.637484e-05 2.474888e-05 2.380884e-05
Kentucky Georgia Tennessee Mississippi New.York
deaths.per.capita.1900 2.197924e-05 2.130216e-05 1.927341e-05 1.787646e-05 1.684226e-05
Missouri Maryland South.Carolina Louisiana West.Virginia
deaths.per.capita.1900 1.407054e-05 1.401243e-05 1.319298e-05 1.208539e-05 1.17091e-05
North.Carolina Illinois Rhode.Island Alaska Montana
deaths.per.capita.1900 9.956798e-06 9.098383e-06 7.972384e-06 7.272159e-06 6.257313e-06
Pennsylvania Iowa South.Dakota Texas Colorado
deaths.per.capita.1900 6.143931e-06 6.122254e-06 5.747093e-06 5.474933e-06 3.642552e-06
District.of.Columbia Oklahoma Michigan Minnesota
deaths.per.capita.1900 3.295436e-06 3.179059e-06 2.689532e-06 2.285663e-06
Florida Nebraska Vermont Utah Washington
deaths.per.capita.1900 2.241472e-06 1.900677e-06 1.776963e-06 1.741301e-06 1.643827e-06
New.Jersey Massachusetts Wyoming North.Dakota New.Hampshire Nevada
deaths.per.capita.1900 1.422993e-06 4.98635e-07 0 0 0 0

Deaths per Capita by State, 1850

Top: DC, GA, WV, TN, NY

District.of.Columbia Georgia West.Virginia Tennessee
deaths.per.capita.1850 7.479637e-06 3.483736e-06 1.994463e-06 1.822756e-06
New.York Ohio Kentucky Pennsylvania Alabama
deaths.per.capita.1850 1.75318e-06 1.384235e-06 1.358324e-06 1.333586e-06 1.306445e-06
Indiana Virginia Arkansas Wisconsin Missouri
deaths.per.capita.1850 1.270899e-06 1.205298e-06 1.047389e-06 8.734705e-07 7.586001e-07
Kansas Mississippi New.Jersey Louisiana Florida
deaths.per.capita.1850 5.24852e-07 4.589442e-07 4.136223e-07 3.726454e-07 3.608408e-07
Illinois Minnesota North.Carolina Michigan Texas
deaths.per.capita.1850 3.443457e-07 3.352911e-07 2.461884e-07 1.569424e-07 1.296816e-07

Deaths per Capita by State, 1800

Top: DC, NH, NC, CT, VA

District.of.Columbia New.Hampshire North.Carolina Connecticut
deaths.per.capita.1800 3.587856e-06 2.429614e-06 1.584108e-06 1.100812e-06
Virginia New.Jersey South.Carolina Massachusetts New.York
deaths.per.capita.1800 1.078642e-06 1.061758e-06 7.460927e-07 7.129245e-07 6.878626e-07
Ohio Pennsylvania Kentucky Georgia Texas
deaths.per.capita.1800 4.810531e-07 4.760307e-07 4.657284e-07 4.511961e-07 3.280076e-07

Army Deaths by State, 2000

Top: TX, NY, CA, OH, PA

Texas New.York California Ohio Pennsylvania Illinois Florida Michigan
pop.army.2000 11537 10412 9296 8756 7655 6844 6575 6239
North.Carolina Massachusetts Missouri Tennessee Georgia Minnesota Kentucky
pop.army.2000 5720 5330 5231 5218 4276 4189 4132
Virginia Alabama New.Jersey Indiana Wisconsin Oklahoma Maryland Louisiana
pop.army.2000 4070 4017 3743 3674 3615 3485 3072 3011
South.Carolina Arkansas Mississippi Connecticut West.Virginia Iowa Washington
pop.army.2000 2878 2822 2631 2597 2569 2237 2064
Kansas Arizona Colorado Maine New.Mexico Nebraska New.Hampshire North.Dakota
pop.army.2000 1918 1852 1809 1371 1224 1055 901 860
South.Dakota Nevada Hawaii Rhode.Island Montana Idaho Utah Delaware Vermont
pop.army.2000 858 856 793 756 739 650 590 553 526
Wyoming Alaska District.of.Columbia X1.36
pop.army.2000 256 166 19 15

Army Deaths by State, 1950

Top: CA, NY, NJ, MO, TX

California New.York New.Jersey Missouri Texas Minnesota Tennessee Kentucky
pop.army.1950 2395 2185 568 557 533 480 444 267
Illinois Kansas Ohio Wisconsin Hawaii Arkansas Maryland Virginia Louisiana
pop.army.1950 218 214 197 196 184 150 147 145 140
Florida Indiana Georgia North.Carolina Alabama Oklahoma New.Mexico Mississippi
pop.army.1950 139 112 83 79 71 60 53 44
Colorado South.Dakota Iowa Pennsylvania Massachusetts South.Carolina
pop.army.1950 43 41 34 27 26 25
West.Virginia Michigan Nebraska Arizona X1.36 Washington Maine Montana
pop.army.1950 24 24 17 17 16 12 9 7
Connecticut Vermont Alaska Utah New.Hampshire Wyoming Rhode.Island Idaho
pop.army.1950 6 5 4 3 3 1 1 1
Delaware District.of.Columbia North.Dakota Nevada
pop.army.1950 1 0 0 0

Army Deaths by State, 1900

Top: CA, VA, NY, GA, WI

California Virginia New.York Georgia Wisconsin Kansas Indiana Alabama Texas
pop.army.1900 971 187 148 138 130 115 100 96 90
Maine Kentucky Missouri North.Carolina Tennessee Arkansas Pennsylvania
pop.army.1900 79 71 64 62 61 56 54
Illinois South.Carolina New.Mexico Ohio Mississippi Florida Louisiana Michigan
pop.army.1900 45 39 38 34 32 29 27 23
West.Virginia X1.36 Iowa Colorado New.Jersey Minnesota Rhode.Island Oklahoma
pop.army.1900 19 17 16 12 10 10 8 7
Washington Massachusetts Maryland Alaska Nebraska Vermont Utah South.Dakota
pop.army.1900 4 3 3 3 2 1 1 1

Army Deaths by State, 1850

Top: OH, GA, NY, WV, TN

Ohio Georgia New.York West.Virginia Tennessee Kentucky Alabama Wisconsin
pop.army.1850 11 9 8 4 4 4 4 3
Missouri Illinois Virginia Pennsylvania Indiana Arkansas Texas North.Carolina
pop.army.1850 3 3 2 2 2 2 1 1
New.Jersey Mississippi Louisiana Wyoming District.of.Columbia Washington
pop.army.1850 1 1 1 0 0 0

Army Deaths by State, 1800

Top: NY, NJ, MA, TX, SC

New.York New.Jersey Massachusetts Texas South.Carolina North.Carolina
pop.army.1800 2 2 2 1 1 1
Kentucky Georgia Connecticut Wyoming Wisconsin West.Virginia
pop.army.1800 1 1 1 0 0 0

Air Force Deaths by State, 2000

Top: TX, CA, FL, NY, OH

Texas California Florida New.York Ohio Pennsylvania Illinois Massachusetts
pop.airforce.2000 3557 2899 2104 2098 1749 1502 1288 1215
Missouri Michigan Tennessee North.Carolina Georgia Minnesota Oklahoma
pop.airforce.2000 1143 1122 1028 993 967 934 896
Alabama Wisconsin Virginia New.Jersey Colorado Arizona Louisiana Indiana
pop.airforce.2000 842 774 759 740 706 694 692 668
South.Carolina Kentucky Washington Arkansas Maryland Mississippi
pop.airforce.2000 624 624 606 576 561 546
Connecticut Kansas Iowa West.Virginia Nevada New.Mexico Maine Nebraska
pop.airforce.2000 474 460 432 406 355 346 292 262
New.Hampshire Idaho Montana Rhode.Island Utah South.Dakota Hawaii Delaware
pop.airforce.2000 243 187 180 174 172 167 152 148
North.Dakota Vermont Wyoming Alaska X1.36 District.of.Columbia
pop.airforce.2000 137 110 73 46 20 4

Air Force Deaths by State, 1950

Top: CA, TX, MO, NY, MN

California Texas Missouri New.York Minnesota X1.36 Tennessee Kentucky
pop.airforce.1950 151 58 58 41 26 21 20 19
New.Jersey Illinois Virginia New.Mexico Kansas North.Carolina Indiana
pop.airforce.1950 17 9 8 7 7 6 6
Florida Alabama South.Dakota Oklahoma Louisiana Wisconsin Pennsylvania
pop.airforce.1950 6 6 5 5 5 4 4
Georgia Arkansas Michigan Hawaii Ohio Mississippi Massachusetts Colorado
pop.airforce.1950 4 4 3 3 2 2 2 2
West.Virginia Maryland Iowa Wyoming District.of.Columbia Washington
pop.airforce.1950 1 1 1 0 0 0

Air Force Deaths by State, 1900

Texas California
pop.airforce.1900 1 1

Navy Deaths by State, 2000

Top: CA, TX, NY, FL, MA

California Texas New.York Florida Massachusetts Ohio Pennsylvania Illinois
pop.navy.2000 4798 3646 3332 2893 2745 2690 2345 2123
Michigan North.Carolina Missouri Minnesota Tennessee Virginia New.Jersey
pop.navy.2000 1875 1740 1660 1604 1571 1382 1299
Georgia Indiana Alabama Oklahoma Wisconsin Washington Arkansas Kentucky
pop.navy.2000 1256 1206 1168 1115 1114 1084 976 974
Connecticut Louisiana South.Carolina Maryland Iowa Mississippi West.Virginia
pop.navy.2000 962 956 920 899 811 804 789
Arizona Colorado Kansas Maine New.Hampshire New.Mexico Rhode.Island Nevada
pop.navy.2000 761 737 639 520 416 414 412 411
Nebraska Montana Utah Idaho South.Dakota North.Dakota Delaware Vermont Hawaii
pop.navy.2000 362 291 264 255 253 210 193 177 152
Wyoming Alaska X1.36 District.of.Columbia
pop.navy.2000 87 52 25 7

Navy Deaths by State, 1950

Top: CA, NY, NJ, MN, MO

California New.York New.Jersey Minnesota Missouri Tennessee Florida Texas
pop.navy.1950 653 395 78 68 53 33 31 28
Kentucky Maryland Wisconsin Hawaii Kansas Pennsylvania Illinois Virginia
pop.navy.1950 25 20 19 19 16 15 14 12
New.Mexico Ohio Alabama Massachusetts Indiana North.Carolina South.Dakota
pop.navy.1950 12 11 10 9 9 8 7
South.Carolina Arkansas Georgia Oklahoma Louisiana Colorado Arizona
pop.navy.1950 7 7 6 5 5 5 5
West.Virginia Iowa Nebraska Mississippi Alaska Washington Rhode.Island
pop.navy.1950 3 3 2 2 2 1 1
North.Dakota Montana Michigan Maine Connecticut Wyoming District.of.Columbia
pop.navy.1950 1 1 1 1 1 0 0

Navy Deaths by State, 1900

Top: CA, NY, MD, VA, PA

California New.York Maryland Virginia Pennsylvania Maine Ohio Kansas
pop.navy.1900 34 31 12 9 9 8 7 6
Wisconsin Indiana Illinois South.Carolina North.Carolina Michigan New.Jersey
pop.navy.1900 5 5 4 3 2 2 1
Missouri Mississippi Iowa Alabama Wyoming West.Virginia District.of.Columbia
pop.navy.1900 1 1 1 1 0 0 0

Navy Deaths by State, 1850

Virginia Florida
pop.navy.1850 1 1

Marine Corps Deaths by State, 2000

Top: CA, TX, NY, OH, FL

California Texas New.York Ohio Florida Pennsylvania Illinois Massachusetts
pop.marines.2000 1346 1109 949 782 731 666 606 571
North.Carolina Michigan Missouri Virginia Tennessee Minnesota New.Jersey
pop.marines.2000 548 539 476 394 391 367 364
Indiana Georgia Wisconsin Louisiana Kentucky Maryland Arizona Oklahoma
pop.marines.2000 336 331 324 282 282 273 269 265
Connecticut Alabama South.Carolina Arkansas Washington Mississippi Iowa
pop.marines.2000 253 252 237 220 218 215 197
West.Virginia Colorado Kansas Maine New.Mexico Nevada Nebraska
pop.marines.2000 193 191 157 132 117 114 105
New.Hampshire Montana Idaho Utah Rhode.Island North.Dakota South.Dakota
pop.marines.2000 99 82 81 75 70 66 61
Hawaii Delaware Vermont X1.36 Wyoming Alaska District.of.Columbia
pop.marines.2000 58 48 39 30 29 16 2

Marine Corps Deaths by State, 1950

Top: CA, NY, HI, TX, MN

California New.York Hawaii Texas Minnesota X1.36 Missouri Tennessee
pop.marines.1950 172 70 54 34 33 31 27 22
New.Jersey Illinois Florida Virginia Ohio Louisiana Kentucky South.Carolina
pop.marines.1950 21 14 10 6 6 6 6 5
Pennsylvania Maryland Colorado South.Dakota Kansas Arkansas Alabama
pop.marines.1950 5 5 5 4 4 4 4
Wisconsin Oklahoma Indiana Massachusetts Georgia West.Virginia New.Mexico
pop.marines.1950 3 3 3 2 2 1 1
New.Hampshire Nebraska Mississippi Wyoming District.of.Columbia Washington
pop.marines.1950 1 1 1 0 0 0

Marine Corps Deaths by State, 1900

Top: CA, NY, WI, DC, VA

California New.York Wisconsin District.of.Columbia Virginia
pop.marines.1900 17 4 3 2 2
South.Dakota South.Carolina Ohio Alaska Wyoming West.Virginia Washington
pop.marines.1900 1 1 1 1 0 0 0

Wars served in by veterans deceased:

  • WORLD WAR II: 2873227
  • KOREA: 768669
  • VIETNAM: 634830
  • WORLD WAR I: 350223
  • WORLD WAR II, KOREA: 175399
  • CIVIL WAR: 163817
  • KOREA, VIETNAM: 94051
  • WORLD WAR II, KOREA, VIETNAM: 88833
  • SPANISH AMERICAN WAR: 28906
  • PERSIAN GULF: 25501
  • WORLD WAR II, WORLD WAR I: 9497
  • PERSIAN GULF, VIETNAM: 8755
  • CONFEDERATE STATES: 5192
  • IRAQ: 4826
  • WORLD WAR II, VIETNAM: 3630
  • REVOLUTIONARY WAR: 3205
  • WAR OF 1812: 1723
  • IRAQ, PERSIAN GULF: 1687
  • MEXICAN BORDER, WORLD WAR I: 1488
  • AFGHANISTAN: 1144
  • WORLD WAR II, KOREA, WORLD WAR I: 1023
  • INDIAN WARS: 882
  • MEXICAN WAR: 844
  • MEXICAN BORDER: 706
  • SPANISH AMERICAN WAR, WORLD WAR I: 550
  • AFGHANISTAN, IRAQ: 497

Whew!  Okay, so that’s most of the results I gathered.

Conclusions

I was not happy with my results.  First of all, the results are confusing because my data is a poor proxy; are these veterans being buried in certain places because they were stationed in that state (hence, Marines and Navy would more likely be buried in coastal states — according to the stats, California has the most deaths for both the Marines and Navy) or because they grew old and died, or what?

Doing time periods wasn’t reliable for tracking results from war.  Some veterans would live longer than others, after all.  Besides, from looking at the basic data, it appeared as though the statistical reporting of the data was weak up until even the 90’s.  So you can imagine the deceased population on a graph from 1800 to 2010 as being a graph that spikes from low bar levels to very high ones in the 90’s (I assume statistical reporting greatly improved at around the time of World War II, so these boomers’ deaths in the latter 20th century were captured on record) and not a smooth curve or at least a normal population growth curve.

I thought it was interesting that deaths by capita in 1950 had New York in the top 5 states.  Usually death by capita is dominated by states with small populations.  But NY was in the top 5 in 1950?  It’s one of the most populous states now and has been throughout the US’s history.  Was this a reflection of NY’s economic success after WW2, or maybe its dedication to the American cause?

Deaths by rank would take some analysis.  Naturally PFCs and SGTs (my rank) died more than anyone else.  Enlisted ranks tended to have higher numbers.  But colonels and majors died a lot more than 1st Lieutenants and 2nd Lieutenants.  This was because officers have a higher life expectancy and are (probably) less likely to die in war and someone is not very likely to die young as an officer before he’s promoted at least to Captain or Major.  We junior enlisted had very short life expectancy.

To be honest, I was most interested in just state population numbers over time.  They remind me just how young the US is!  It’s crazy that California went from like 95k people in 1850 to 3 million by 2000!  North Dakota’s population has not changed that much since its inception.

But without further and deeper analysis, a lot of what I wrote above is not statistically proven.  The data is messy and my handling of it was sloppy (grep’ing for CPL vs. LCPL, “ARMY” capturing “ARMY AIR CORPS” (the Air Force used to be part of the Army), etc.).  No numbers really stood out to me in my analysis.  But this was excellent practice at writing code in R — so at least something productive came out of this!  I put all my R code and collected data and other scraps in my github repo for this final, available here.