Would you like to work with the most exciting opportunity and the sharpest minds in AI?
Rather than serving you with a normal job-description, here is a 4h-challenge - if you successfully complete this, we would love to schedule an interview with you!
Now, to the challenge:
Your task is to develop a Python script that effectively handles an imbalanced classification problem with the "Credit Card Fraud Detection" dataset from Kaggle. The dataset is highly unbalanced, making it a perfect case for evaluating how well you can handle such scenarios.
- Load the provided "Credit Card Fraud Detection" dataset from Kaggle, available at this link.
- Conduct necessary preprocessing on the dataset using PySpark, Pandas or a similar library. This could include:
- Handling missing values, if any
- Encoding categorical variables, if present
- Scaling numerical features
Apply one or more suitable techniques to handle the class imbalance problem. You could consider:
- Undersampling the majority class
- Oversampling the minority class
- Using synthetic data generation methods like SMOTE or ADASYN
Split the processed dataset into a training set (80%) and a testing set (20%).
Train a classification model on the processed training data. You can choose from models like:
- Logistic regression
- Support vector machines
- Decision trees
Evaluate the model's performance on the test data. Consider using metrics such as:
- Area under the ROC curve
Write a brief explanation of your choices at each stage, including preprocessing steps, techniques to handle class imbalance, choice of classification model, and evaluation metrics.
Package your final model for production use, for example, using Tensorflow.
Note: The 'Class' column in the dataset is the target variable, and the rest of the columns are to be used as the feature set for the classification problem.
You are expected to provide the following deliverables:
- A Python script implementing the above steps.
- A brief report (in the form of a README file in your GitHub repository) explaining your choices at each stage, including preprocessing steps, techniques to handle class imbalance, choice of classification model, and evaluation metrics.
- A GitHub repository link containing all your code and the report.
- A packaged production-ready version of your final model.
Your solution will be evaluated on the following criteria:
Data Preprocessing: Effective handling of missing data, if any, encoding of categorical variables, if present, and scaling of numerical features.
Handling of Imbalance: Effective use of techniques to deal with imbalanced classes, as evidenced by improved model performance.
Model Training: Appropriate choice and effective use of a classification algorithm, demonstrated through the model's performance on the test data.
Model Evaluation: Correct use of evaluation metrics and interpretation of their results to assess the model’s performance.
Code Quality: Cleanliness, efficiency, and clarity of your Python code, following best practices.
Model Packaging: Ability to create a production-ready package of the final model.
Explanation and Justification: Clarity and cogency of the explanations provided for your choices at each step of the process. Your ability to justify these decisions and their impact on the final model performance will be of particular interest.
Creativity and Innovation: While you are expected to follow the described steps, innovative ideas and strategies to improve the model performance are welcome and will be viewed positively.
We would like to clarify that the focus of the evaluation lies not solely on the final performance of the model, but also on the appropriateness of your methodologies, the clarity and efficiency of your code, and your ability to justify your decisions.
Apply with either your test + CV in English, or with your CV and any questions you may like to ask. The role is based in Malta, EU. We would like to employ someone who can start within 4 weeks, should the role be offered.
Salary €55-70K - could be negotiable
*JobMatchingPartner Limited is a recruitment agency licenced in Malta, EU. We act on behalf of numerous clients based in Malta and elsewhere. JobMatchingPartner does not share your personal details with any third party without your written consent