? Objective: Client have 1million files which contains ingredient of chemical(s). Client wanted to retrieve ingredient name and their value(s) from files available in S3 Bucket. ? Analytical Approach: Retrieve the data from s3 bucket into EC2 and then perform OCR (Optical Character Recognition) to get the content from the files. Developed a model to fetch key value Pair(s) from all contents. For OCR, we used Imagemagick, ghostscript, pytesseract etc. library and for S3 bucket connectivity we used Boto3 package. We have also implemented multiprocessing to reduce the time for OCR execution. Tools Used – Python, Jupyter Notebook, Excel etc.