AWS: I’ve navigated the vast Amazon Web Services (AWS) ecosystem, leveraging services like:
- Amazon EMR: For processing large datasets using Hadoop and Spark.
- Amazon Kinesis: Enabling real-time data streaming and analytics.
- Amazon Redshift: My go-to for fully managed data warehousing.
- Amazon Glue: Seamlessly transforming and moving data across sources.
GCP (Google Cloud Platform): I’ve explored GCP’s powerful tools tailored for data engineering:
- Cloud Dataproc: Managing Hadoop and Spark clusters efficiently.
- Cloud Dataflow: My choice for real-time data streaming and analytics.
- BigQuery: A fully managed data warehouse that scales effortlessly.
- Cloud Composer: My trusted ETL service for seamless data transformations.
Web Scraping Wizardry:
- I’ve scraped data from diverse sources—news articles, e-commerce websites, and social media platforms. Whether it’s Python scripts or cloud-based functions, I love turning raw web data into actionable insights.
Data Modeling Maven:
- Crafting robust data models is my forte. From conceptual design to normalisation, I’ve built databases that withstand the test of time.