Hadoop WebHDFS Source

Hadoop WebHDFS Source

Parent Previous Next

Hadoop WebHDFS Source

The "Hadoop WebHDFS Source" is used to stream large files stored in the HDFS of a Hadoop server which can be converted into rows of data within SSIS. Currently, the Hadoop WebHDFS Source only supports text and CSV files. See the Hadoop WebHDFS Connection Manager page to learn more about setting up the connection manager.

Important Note

Users who are able to successfully test their connection yet receive an "unable to connect" error at runtime, please direct your attention to the following help document as you may need to update your local hosts file.

Hadoop WebHDFS is availabile for SQL versions 2012 and higher.

 File Name - The filename (if in the root directory) or path to the files stored within HDFS (example: FolderName/DataFile.txt.)


o   Data Contains Headers? - Similar to the native Flat File Source, this selection identifies the first row as containing column headers.

o   Row Delimiter - Identifies a character or carriage return (\n) to signify a new row.

o   Column Delimiter - Identifies the character used to separate values for the different columns such as a comma.

o   Text Qualifier - Identifies the character used to wrap values such as quotation marks.

 Output Columns - Users can create, remove, and configure the name, index (zero-based), data type, length, precision, and scale of the columns being extracted from the text file.


Using the Hadoop WebHDFS Source - https://pragmaticworks.wistia.com/medias/tynbiu244f