About me

My name is Débora Craveiro

Currently I am working as a Data Scientist and Machine Learning Engineer at Xpand IT. I am working on a project for one of the biggest portuguese retailers.

I am a self-motivated and highly adaptable person. Data Science crossed paths with my academic journey, and I decided to take it more seriously and embrace a career migration to the field, since my former background is Mechanical Engineering.

In my previous experience as a Data Scientist at Hidromod, I worked with satellite imagery data, to solve crop classification problems, and crop growth forecasting using time-series.

Skills

Programming Languages and Database

  • Python focused in data analysis
  • SQL for data extraction
  • PySpark
  • Web scraping using Python
  • Postgres, SQLite databases
  • JavaScript for Google Earth Engine Platform

Statistics and Machine Learning

  • Descriptive Statistics (measures of center, measures of spread, skew, kurtosis)
  • Regression, Classification, and Clustering algorithms
  • Different methods for balance data, feature selection, and dimensionality reduction
  • Algorithms performance metrics (RMSE, MAE, MAPE, Confusion Matrix, Precision, Recall, Silhouette Score)
  • Machine Learning packages: Sklearn, Scipy, and Keras

Data Visualization

  • Matplotlib, Seaborn, and Plotly
  • Metabase (early stage)

Software Engineering

  • Azure DevOps, Azure Databricks, Heroku Cloud
  • Git, Linux, Cookiecutter, virtual environment, and Docker
  • Streamlit, Flask, Python API's

Professional Experiences

1+ as Data Scientist

Starting a new project with the Customer Intelligence & Analytics deparment of one of the biggest portuguese retailers.

End to end data science solutions for the Agricultural industry resorting time-series data from satellite imagery. Crop type classification problems, detection of wrongly declared agricultural classes using data analysis, crop growth forecasting, and water level monitoring in reservoirs.

3+ Data Science Projects

Data-driven business solutions, close to the real challenges of the market, using public data provided for Data Science competitions, where I approach the problems from the business problem conception to the publication of the trained algorithms in the production environment using cloud computing tools.

4+ years as Junior Researcher

Since the first year of my master's degree, I have been working with programming languages to solve different kinds of problems. Initially, I worked with Matlab and Maple, and then I migrated to Python. Working as a researcher, I had the opportunity of implementing different types of routines, from simply solving complex algebric problems to advanced optimization techniques (Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization, Conjugate Gradient, Augmented Lagrangian, etc.).

Data Science Projects

Loyalty Program using Client Clustering

I used Python, statistics and unsupervised learning to segment clients based on their purchase characteristics, with the objective of selecting a group of clients to participate in a loyalty program, in order to increase the total income of the company.
This is an ongoing project.

Tools

  • Git, Linux
  • Python, Pandas, Matplotlib, and Seaborn
  • Jupyter Notebook
  • KMeans, Gaussian Mixture Model, Hierarquical Clustering, and DBScan
  • Metabase Visualization (early stage)

Web scraping for Jeans Price Prediction

This project aims to price male jeans for a new store on the market. It constitutes web scraping, database manipulation, and ETL design. For price prediction the data available on H&M and Macys websites is being used.
This project is on standby.

Tools

  • Git, Linux
  • Python, SQLite
  • BeautifulSoup, Pandas

Sales Prediction for a Drugstore Chain

This project had the objective of predicting the sales of the next six weeks of Rossmann drugstores. A machine learning regression model was used, and the results show an aproximate accuracy of 88%, it has significantly improved the solution when comparing to the established baseline. Using only the validation data, the new model improved on average the solution in plus US$2857 by store.
The sales prediction by store can be accessed anywhere via Telegram.

Tools

  • Git
  • Python, Pandas, Numpy, and Seaborn
  • Anaconda, Pycharm, and Jupyter Notebook
  • XGBoost Regressor, Random Forest Regressor, Linear Regression, Lasso
  • Flask, and Python API
  • Heroku Cloud

Real Estate market project to identify opportunities for Reselling

Identify property selling below their average price, and definition of their ideal reselling price based on an exploratory data analysis using Python.
(Unfinished)

Tools

  • Python, Pandas, Numpy, and Seaborn
  • Anaconda, Pycharm, and Jupyter Notebook
  • Interactive Maps with Plotly and Folium
  • Heroku Cloud
  • Streamlit Python web framework

Contacts

Feel free to get in touch with me.