I have extensive experience as a Data Scientist, working with big data, trend analysis and predictive algorithms.
Skill set:
- SQL for data collection, data cleaning, ensuring data integrity. Working in an Agile environment to deliver model-based campaigns.
Built propensity models for demand generation and predicting new broadband customers • Building, testing, and debugging Python code. Model/code review and refresh. Presenting reports to stakeholders. AWS.
Built clustering model for customer segmentation and understanding customer journey. Model objective was to identify different segments in the existing customer base to understand their journey to purchasing a product.
Visualization, Clustering and K-Means.
- Applying regression algorithms to predict crop yield. Data wrangling techniques for pre-processing and readying for model development. Application of various exploratory data analysis techniques for data visualization and presentation. Developing a regression model for predictive analysis to predict the best outcome given different variables. Data cleaning, PCA, Random Forest Regression, and EDA.
- NLP – SENTIMENT ANALYSIS/CLASSIFICATION
Skills - Web-scraping, text pre-processing (Html tag removal, Tokenization, Removal of numbers, Removal of Special Characters and Punctuations, Removal of stopwords, Conversion to lowercase, Lemmatize or stemming), model building (BoW, TFIDF, WE), model evaluation and improvement.
Create a classifier capable of determining the class of the X-ray images fed into the model
(Deep Neural Network, Image Pre-processing, image augmentation techniques, CNN).
- EMPLOYEE PROMOTION PREDICTION
Predict if a person is eligible for promotion or not using boosting models – (Bagging, AdaBoost, GradientBoost, XGBoost), and hyperparameter tuning.
Applied ML techniques on Python to predict fraudulent transactions from a bank dataset. Using Neural Networks, and hyperparameter tuning using gridsearch.
Built a model that will help the marketing department to identify the potential customers who have a higher probability of purchasing the loan. Using EDA, Data, Pre-processing, Logistic regression, finding optimal threshold using AUC-ROC curve, Decision trees, pruning, Cross validation, regularization, Pipelines and hyperparameter tuning, up and down sampling.