Objective
Migrate existing SQL Server Integration Services (SSIS) packages to Azure Synapse. This migration involves automating the generation of Data Definition Language (DDL) scripts and Data Feed configurations (in JSON format) by extracting metadata from a SQL database.
Scope and DeliverablesDDL and Data Feed Configuration Files
- Migrate and automate the generation of both DDL scripts and Data Feed scripts.
- Store and manage these scripts in SQL format at shared file location in the data lake .
- Data Sources
- Use metadata from the SQL database to produce the required scripts.
- Leverage a straightforward SQL query (provided in Input_sql_query.sql ) to retrieve table names, schema details, column information, and object IDs.
- Implementation Steps
Run the SQL Query
- Use a Script Activity in Azure Synapse to execute the provided query (Input_sql_query.sql).
- The output of this query should be captured in JSON format (e.g., tableList.json).
Process the JSON Output
- Read the JSON file within a PySpark notebook in Synapse (the sample script is given in notebook_python_code.py).
- Iterate through each table entry in the JSON to retrieve necessary metadata.
Generate DDL and Data Feed Scripts
- For each table, dynamically build the DDL script (see sample expected_output_Employee_Table.sql) and corresponding Data Feed script in JSON format.
- Store these output files (e.g., Table_Name_1.sql, Table_Name_2.sql, etc.) in the designated location within Azure Data Lake.
Manage File Creation/Updates
- Ensure that any newly generated JSON and SQL files are correctly created or updated in Azure Data Lake as tables are processed.
- Attached Files
Input_sql_query.sql
- A SQL script retrieving necessary metadata (table names, schema names, columns, object IDs).
sql_output.json (tableList.json)
- The JSON output of the Script Activity in Synapse.
sql_output.png
- A screenshot demonstrating the expected query results (approximately ten tables).
notebook_python_code.py
- Reference Python (non-PySpark) code showcasing how to load the JSON output and dynamically generate DDL scripts.
expected_output_Employee_Table.sql
- An example of a final DDL script. Similar scripts can be generated for each table automatically.
... Show more