Executive problem statement (one page)
Aim
To clearly articulate your understanding of the business problem to management.
Clearly state who is interested in solving the problem and why.
Ensure that this section is improved and included in the final submission.
Simple Data Set
Overall satisfaction is defined in simple terms (≥ 4).
Cross-reference your problem statement with tables or charts from the following section.
Answers to business questions (A) and (B) are given and justified (highlight them).
Complex Data Set
The concept of overall satisfaction is extended with justification to be relative to other attributes, e.g. the number of reviews, price, room type or neighbourhood.
Cross-reference your problem statement with tables or charts from the following section.
Answers to business questions (A) and (B) are given and justified (highlight them).
Hints
Make sure your exec summary is very clear.
You can restate or rephrase the problem statement as you gain better understanding.
Do not invent your own problem – it has been given to you but may not be achievable in its current form.
Ensure that whatever problem you describe can be solved using the provided data.
Make sure the statement describes the problem from the business perspective and not a technical perspective.
Use business language and not computer / mathematical / statistical / data science language.
The problem statement should describe the high level aims and not the methods of their achieving.
Think and state the likely benefit of this project for the company and its management.
Think and state who the company clients are and what the likely benefits of this project are for them.
Do not include any charts or tables in the problem statement section.
However, cross-reference your problem statement with tables or charts from the following section, e.g. you can refer to them as “… (see Figure 1)” or “As shown in Table 4…”.
If you need to support your statements / analysis / argument with references to any published materials, use Harvard citation style as described in: http://www.deakin.edu.au/students/studying/study-support/referencing/harvard. As the executive summary should not take even one page, we suggest to include your bibliographic references at the bottom of this page, immediately below the executive summary (or problem description).
All comments, such as this, which are not part of your submission should be deleted to save space.
Data exploration (one page)
Aim
To demonstrate your understanding of data and report any insights emerging from data analysis.
Ensure that this section is improved and included in the final submission.
Simple Data Set
Data obtained.
RM project prepared.
Attributes selected and analysed.
Their characteristics are tabulated and visualised, with brief annotations (using text and arrows).
Complex Data Set
In addition, all tables and charts are analysed for the relevant and important business insights, which are explicitly reported. All visualisations are included selectively to support further work.
Hints:
Take screenshots of the relevant parts of the screen, not the whole window or the whole screen.
On Win10 use Snipping tool, on Mac or Linux use Screenshot app, or install Spectacle.
Include here the text of your analysis with visual evidence to support the analysis.
If you include any charts or tables you must describe them (e.g. by using arrows / boxes).
Make sure that any included chart is readable (so do not shrink it into microscopic size).
If you scale the included screen shots keep their proportions (do not distort images).
Most importantly describe what those data features mean and how important they are, and why.
Do not include here any parts of the RM process – it has its own section further in the report.
If your analysis or results could only be determined by inspecting the process and running it,
the marks will be reduced – if it is not in the report, it does not exist for the marker!
Your analysis and description could include:
Avoid indiscriminate “dumping” of tables, charts or code into this section – all content must have its purpose.
All included charts, tables or RM processes (or their parts) have to be described or used in the discussion.
Make sure that all charts, tables and important results are labelled for cross-referencing, e.g. “Figure 1 – Histogram of Overall Rating” or “Table 4 – Comparison of model performance”.
All comments, such as this, which are not part of your submission should be deleted to save space.
Executive solution statement (one page)
Aim
To clearly articulate your understanding of the business solution to management.
Simple Data Set
The business solution is succinctly described for executives and justified.
Cross-references with the technical sections of the report provided for support, e.g. to tables, charts and plots.
Business answer to question (C) is given and justified (highlight it).
Complex Data Set
In addition, business decisions and actions enabled by the solution are explained.
Cross-refs with technical sections support exec summary.
Business answers to questions (C-D) and opt (E) are given and justified (highlight them).
Hints
Ensure that whatever problem you describe can be solved using the provided data.
Make sure the exec summary describes the solution from the business perspective and not a technical perspective.
Use business language and not computer / mathematical / statistical / data science language.
The solution statement should describe the high level benefit and not the methods of their delivery.
Think and state who the company clients are and what the likely benefits of this project are for them.
Ensure that your solution clearly matches the problem statement.
Ensure that the solution is formulated in terms of achieving the high-level business aim.
Do not include any charts or tables in the solution statement section.
However, cross-reference your problem statement with tables or charts from the following section, e.g. you can refer to them as “… (see Figure 1)” or “As shown in Table 4…”.
If you need to support your statements / analysis / argument with references to any published materials, use Harvard citation style as described in: http://www.deakin.edu.au/students/studying/study-support/referencing/harvard. As the executive summary should not take even one page, we suggest to include your bibliographic references at the bottom of this page, immediately below the executive summary (or problem description).
All comments, such as this, which are not part of your submission should be deleted to save space.
Data Preparation (one page)
Aim
To demonstrate your understanding of data by describing complex relationships between attributes.
Depending on the selected model some attributes may need to be transformed or new attributes created.
Simple Data Set
Relationships between attributes, are explored and visualised.
Labels and predictors are selected and justified.
New attributes are generated and old ones transformed as needed.
All charts annotated (with text and arrows) to highlight important insights.
Complex Data Set
In addition, attribute weights are used to select the most useful attributes.
All missing values, duplicates and data errors handled adequately.
Hints
Many hints are identical to those in the section on “Data Exploration” so read them!
Some preliminary data exploration has already been conducted in the previous sections.
Focus on depicting attributes relationships and not their individual characteristics,
Include here the text of your analysis with tables and charts.
Your analysis and description could include:
Avoid indiscriminate “dumping” of tables, charts or code into this section – all content must have some purpose.
All included charts, tables or RM processes (or their parts) have to be described or used in the discussion.
Make sure that all charts, tables and important results are labelled for cross-referencing, e.g. “Figure 1 – Histogram of Overall Rating” or “Table 4 – Comparison of model performance”.
All comments, such as this, which are not part of your submission should be deleted to save space.
Model Development (one page limit)
Aim
To explain details of developed classification models and selected methods for data preparation and reporting.
Simple Data Set
k-NN classification model developed.
The process, its operators and their parameters described and annotated (with text and arrows).
The values of the model parameters are justified.
Operators annotated (with text and arrows) to highlight important insights.
Complex Data Set
In addition, a Decision Tree (or forest) is included as the second classifier.
Class imbalance is investigated, dealt with and justified.
Hints
Your textbook will be extremely helpful in this task.
Include here screenshots of all or parts of the RM process.
If your process is very large, consider splitting it into sub-processes or separate processes.
If your process does not fit into this page, include only the most important parts.
By including arrows and text boxes (e.g. with numbers to refer to) annotate each operator and its properties.
Note that some of your justifications may utilise cross-referencing with tables or charts from other sections.
Avoid indiscriminate “dumping” of RM processes/models into this section – all content must have some purpose.
You may include a brief description of the operators and what they did but this is NOT the aim of this section.
Do not include definition of terms or a “textbook” description of operations – we already know this!
All comments, such as this, which are not part of your submission should be deleted to save space.
Model Evaluation (one page)
Aim
To report and explain the performance of developed classification models.
Simple Data Set
The model is hold-out validated using accuracy and kappa.
Validation results are analysed, interpreted and reported.
A statement is included with justification on to what degree the model advice can actually be trusted
(based on the performance measurements).
Technical answer to question (C) is given and justified (highlight it).
Complex Data Set
In addition, all models are cross-validated and “honestly” tested.
Parameters of all models experimented with and their selection justified.
All models performance tabulated and compared – the best model identified.
In addition to accuracy and kappa measures / charts such as AUC and ROC are also used.
Also, technical answers to questions (D) and optionally (E) are given and justified (highlight them).
Hints
Your textbook will be extremely helpful in this task.
If you have few results to report, include here screenshots of your results, e.g. confusion table or ROC charts.
If you have many results to report, include here a table of all results.
You need to describe and explain your results.
It is the most important that you include here the detailed analysis of your results –
explain the impact of the obtained results on the future use of the model to support decision making.
Avoid indiscriminate “dumping” of performance results – all content must have some purpose.
All comments, such as this, which are not part of your submission should be deleted to save space.
Any materials, analysis or reports that do not fit into 7 (seven pages in total, including the front page) will not be assessed or marked. The only exception is the inclusion of your response to the challenge question.
Challenge Just for Fun (a single page on page 8)
Aim
To undertake a challenging task requiring independent research.
We will definitely look at your work reported here but we will not mark it.
Simple Data Set or Complex Data Set
You can use either of the two data sets.
Include your descriptive analysis of the problem.
Include the screenshot of your RapidMiner process (with annotations).
Include results generated by the process.
Provide some assessment and reflection on the insights generated.
Hints
Your textbook, RapidMiner built-in help and web resources will be extremely helpful in this task.
All comments, such as this, which are not part of your submission should be deleted to save space.