Example Course: OMSBA 5210 - Data Visualization
Week 3: Assignment
Skip To Content
Dashboard
  • Login
  • Dashboard
  • Calendar
  • Inbox
  • History
  • Help
Close
  • My Dashboard
  • Example Course: OMSBA 5210 - Data Visualization
  • Assignments
  • Week 3: Assignment
  • Home
  • Modules
  • Assignments
  • EvaluationKIT Course
  • EvaluationKIT Course
  • Zoom

Week 3: Assignment

  • Due Jan 22, 2023 by 11:59pm
  • Points 10
  • Submitting a text entry box or a file upload

Auto Module Navigation Blue

    UploadIconBlue.png

    Week 3 Assignment

    Instructions

    Download: West Point data Download West Point data

    Follow the below instructions

    Submit: An RMarkdown knitted HTML file with your work (and set echo = TRUE so your code is visible).

    Assignment Details

    See the attached Excel workbook, which has two sheets in it. One describes the data documentation, and the other is the data from the United States Military Academy at West Point (USMA). The data is already fairly clean and tidy, but still requires some more work (as would be typical!).

    At USMA, cadets are basically randomly assigned to "companies", like military units, with whom they spend all their time and take all their classes with. The Excel sheet contains information on the year, the student standing (fresh/soph/etc.) that student has in that year, the student/cadet's gender, the gender of other cadets in their company, and whether they progressed to the next year (1) or dropped out (0). Note that when the documentation refers to "cohort" you can think of this as referring to "class".

    Complete the following data cleaning/manipulation tasks (or analysis tasks which require cleaning and manipulation) in an RMarkdown document. Set echo = TRUE for all your code so it's visible.

    1. EITHER save the data sheet as its own CSV file to load in, OR use the read_excel function in the readxl package to read the sheet in directly from the Excel workbook.

    2. Recreate the femalespeers, malespeers, and totpeople columns based on the documentation for those columns, and check whether your calculations match what's in the original data. In other words, look at the excel sheet, read the variable descriptions, and create new variables that fit those descriptions. "Recreate" means "create from scratch." Do not use the femalespeers, malespeers, and totpeople columns already in the data to create your new ones. That wouldn't be "recreating", that would be "copying."   (NOTE 1: you won't get an exact match with the old columns, NOTE 2: keep in mind these variables count "peers", i.e. not including yourself).

    3. Investigate the rows for which your recreation *doesn't* line up exactly with the original columns. Any ideas what the issue might be? Do you trust the original or your recreation more?

    4. Create two new columns from company_n: company, and division. If it's A-1, for example, A is the company, and 1 is the division.

    5. This data follows a certain number of cohorts, which means that in the first year of the data, we only see a small portion of all students, then more the next year, and so on. Limit the data just to years in which you have all four classes present in full quantity (i.e. not just a few stragglers but all four entire classes appear to be there. This will entail finding which years those are).

    6. Make the following tables:

    a. Top four companies (A, B, C, etc., not A-1, A-2) with the highest continue_or_grad rates

    b. continue_or_grad rates by class

    c. continue_or_grad rates of women by class

    Note you can make a table by just creating the appropriate data set and showing it, or by sending it to the knitr::kable() function to get it formatted a little more nicely.

    7. Bonus task (ungraded, tricky): notice anything strange about the "random assignment" of women?

    1674460799 01/22/2023 11:59pm
    Please include a description
    Additional Comments:
    Rating max score to > pts
    Please include a rating title

    Rubric

    Find Rubric
    Please include a title
    Find a Rubric
    Title
    You've already rated students with this rubric. Any major changes could affect their assessment results.
     
     
     
     
     
     
     
         
    Can't change a rubric once you've started using it.  
    Title
    Criteria Ratings Pts
    This criterion is linked to a Learning Outcome Description of criterion
    threshold: 5 pts
    Edit criterion description Delete criterion row
    5 to >0 pts Full Marks blank
    0 to >0 pts No Marks blank_2
    This area will be used by the assessor to leave comments related to this criterion.
    pts
      / 5 pts
    --
    Additional Comments
    This criterion is linked to a Learning Outcome Description of criterion
    threshold: 5 pts
    Edit criterion description Delete criterion row
    5 to >0 pts Full Marks blank
    0 to >0 pts No Marks blank_2
    This area will be used by the assessor to leave comments related to this criterion.
    pts
      / 5 pts
    --
    Additional Comments
    Total Points: 5 out of 5
    Previous
    Next

    Week 3: Lectures - Data Wrangling Pt. 4 Week 3: Discussion