SQL SERVER INTEGRATION SERVICES – FLAT FILE DATA LOAD, POTENTIAL ISSUES AND WORK AROUNDS
Hi,
ETL projects are usually classified on the basis of the complexity of designing the packages used for data loading, data transformation and finally storing the data in a manner which can be retrieved comfortable for end user reporting requirements.
Majority of disparate source systems in an enterprise generate flat files (text files) which can be consumed for all data processing initiatives.
Few obstacles, which an ETL developer is bound to face can be:
- Quality of data
- Complexity of data relationship
- Similarity between the source and destination data structures
The agenda here is to cover many aspects and design challenges that a designer will face while data loading from flat files and hence I present to you part one of this multi part series.
I have a TSV (Tab Separated File, which has some custom data like file name, file generate date, number of records and many like fields, which obviously need not be inserted to the database). So ideally, the package should be designed in a manner which would simply start reading the file from file header, if any or start reading from the column data, if the file has no headers. So to achieve this, we have two ways:
- Ask the source system; to stop sending the extra data which would be mean on part of the package designer.
- Use the “Header rows to skip” feature. So, if you plan to implement option 2, go ahead and read the blog.
- Now, if you see the graphic above, I would like to skip the first 6 lines and start reading the TSV file from the 7 line.
- Also, the data on the 7th line represents the file column header, so I would like to configure the connection manager to read the actual data when the first {LF} character is encountered. Since my file has {LF} as row delimiter.
- The rows are delimited by a {LF} character (Line Feed) and the columns are delimited by a {t} (Tab space)
Step 1: Create a new file connection manager pointing to the TSV file.
Step 2: Set value of “Header rows to skip” as 6. This is what is desired.
Step 3: Since this is a TSV file, the header rows are separated by {t}, so set the value in “Header row delimiter” as {t}
Step 4: Check the “Column names in the first data row” option. This is desired.
Step 5: After setting the values for columns, configure the row delimiter, for this file I have used a {LF} character.
The file is now ready for any/all transformation(s) as desired. Lastly, do not forget to see the preview opf the alterations made, before applying any transformation tasks.
That’s all. Happy Learning!
Regards
Raunak Jhawar
Like us on FaceBook | Follow us on Twitter | Join the fastest growing SQL Server group on FaceBook
Follow me on Twitter | Follow me on FaceBook