Project

Introduction

As part of the course work, you need to do a project. For this purpose, you are supposed to find an appropriate dataset in one of the following data repositories:

Then you should define a problem with some goals. The first milestone of the project is to submit information about the dataset and your problem.

The second milestone of the project is to read technical papers that are related to your problem (or to your dataset) and submit a summary of that.

You have about a month to work on your project. During this time you should apply appropriate machine learning algorithms to your data in order to solve the problem you defined in the first milestone. While it is important to get meaningful and acceptable results, your mainly grade on your ability to first selecting an appropriate model and second, properly using that model to solve the problem.

Project Milestones

  • First milestone: find the dataset and define the problem by the end of second week (on Saturday of the second week by midnight).
  • Second milestone: complete your literature review on the subject by the end of 5th week (on Saturday of the 5th week by midnight).
  • Third milestone: almost finish your work and submit your oral presentation by the end of the 8th week (on Saturday of the 8th week by midnight).
  • Fourth milestone: short paper, considering the reviews by your classmates, three days after the Saturday of the 10th week.

You also have to review your classmates work during the 9th week and submit your review feedbacks by the midnight of Saturday of the 9th week.

What to deliver?

This project has two deliverable:

  • A presentation: you do the presentation on your convenience, record the video, and submit that on Canvas
  • A short (three-page) paper: which contain your problem statement and motivation, literature review, summary of your work, and your results.

You have to review at least five other projects, using their recorded presentations.

What is an appropriate problem?

An appropriate problem for the course project is a one that satisfies the following conditions:

  • The problem does not have a trivial solution
  • The problem has at least one impact in a particular domain related to business.
  • You use at least two machine learning algorithms to solve the problem. This can be two algorithms that you use sequentially (such as clustering and regression, or PCA and classification), or two algorithms that you want to compare their performance (such as KNN and Naive Bayes, or SVM and neural networks).

The dataset that is used should satisfy the following conditions:

  • Is a non-trivial dataset: examples of trivial datasets are datasets with a constant value for all the records, or with a single obvious pattern.
  • Has more than two features
  • Has more than 100 records (i.e., data points)
  • Is sufficiently large and contain sufficient features that can be used to solve your problem

Project Rubric

The whole project has 200 points and contributes for 20% of the final grade. The points are broken down as follows:

  • 1st milestone: 10 points
    • Dataset satisfies the requirements: 3 points
    • Project definition is clear: 4 points
    • Project definition satisfies the requirements: 3 points

 

  • 2nd milestone: 50 points
    • 10 points for each previous work (5 is required):

      • 5 points for the summary
      • 3 points for the critics
      • 2 points for the writing and presentation

 

  • 3rd milestone: 50 points
    • Articulates clearly: 5 points
    • Speaks confidently: 5 points
    • Speed of delivery is appropriate: 5 points
    • Speaks at an appropriate volume: 5 points
    • Speaks with good expression: 5 points
    • Covers motivation and literature review of the project: 5 points
    • Covers method and implementation: 10 points
    • Covers results: 5 points
    • Has a clear conclusion and at least one non-trivial suggestion for future work: 5 points

 

  • 4th milestone: 75 points
    • Introduction: 5 points
    • Literature review: 10 points
    • Explanation of method: 10 points
    • Explanation of implementation: 10 points
    • Explanation of experiment setup: 10 points
    • Results presentation: 5 points
    • Results discussion: 10 points
    • Appropriate performance metric selection: 5 points
    • Conclusion: 5 points
    • Writing: 5

 

  • Overall evaluation (project evaluation is primarily based on the following criteria): 40 points
    • The student was able to select an appropriate method to solve the problem: 10 points
    • The student was able to select an appropriate performance metric: 5 points
    • The dataset is appropriate for the defined problem: 5 points
    • The student was able to appropriately implement the method: 10 points
    • The student has reported appropriate results in the oral presentation and the paper: 10 points

Programming Language

There is no constraint on the programming language that is used for the implementations. Python is recommended.