Data engineer of around 7 years experience in pyspark, bigdata and cloud AWS. orchestration using airflow if required.
perform ingestion and transformation on the data dump provide by business user in excel format(csv, dat, json or xml format) in particular location on the mft server. Those data are dumped into HDFS/ AWS s3 and Redshift through Glue, Pyspark with all the checks and transformation as expected on the required layers(base, access and abt layer as defined by architect). Querying those data through hive/Impala/AWS Redshift to generate results as required by client and displaying data through report tools Tableau. Airflow is used as scheduler and GIT as version control