Data Translation Challenge
- Due Dec 5, 2022 by 11:59pm
- Points 40
Auto Module Navigation Blue
Business Problem
For the data translation challenge, you are supposed to do a quarter long project. For this purpose, you select a public dataset, and then you define a business related problem that is (1) related to the dataset you selected, and (2) can be solved by machine learning models.
An appropriate problem for the course project is a one that satisfies the following conditions:
- The problem does not have a trivial solution
- The problem has at least one impact in a particular domain related to business.
- You use at least two machine learning algorithms to solve the problem. This can be two algorithms that you use sequentially (such as clustering and regression, or PCA and classification), or two algorithms that you want to compare their performance (such as KNN and Naive Bayes, or SVM and neural networks).
The dataset that is used should satisfy the following conditions:
- Is a non-trivial dataset: examples of trivial datasets are datasets with a constant value for all the records, or with a single obvious pattern.
- Has more than two features
- Has more than 100 records (i.e., data points)
- Is sufficiently large and contain sufficient features that can be used to solve your problem
Analysis
You are supposed to apply at least two machine learning models to the problem and evaluate the results. It is highly recommended that you perform exploratory data analytics on the dataset to explore what the data looks like. Results should be in-line with the business problem you have defined.
Communication
This project has two deliverable:
- A presentation: you do the presentation on your convenience, record the video, and submit that on Canvas
- A short (three-page) paper: which contain your problem statement and motivation, literature review, summary of your work, and your results.
RESOURCES:
Details about this challenge may be found in the project page.