Data Source

What is a Data Source?

The building blocks of data organization are datasets. Each dataset can pull and unify data from multiple data sources. Each dataset resides in a destination.

In simple terms, Qluster's connector reaches out to your data source, receives data from it, and writes it to your dataset at your destination. 

A data source answers the following basic questions:

What is the connection information, such as the bucket name and credentials to a file object storage such as AWS S3?
Which files should it pull? For example, look for files that are in a specific folder.
How often to look for new files in the data source?
Specific validation rules apply only to this data source and not your entire dataset.

Data Source vs. DataSet vs. Destination

There can be multiple data sources for each data set. However, typically each data source corresponds to a specific type of data from a particular data vendor.

For example, a marketplace company can have an "inventory" dataset. This dataset lives in a specific database instance. In Qluster terms, the database instance is called a destination.

The marketplace company is pulling inventory data from multiple stores. Each store has its way of providing inventory data to the marketplace company.

For Example:

Vendor 1 puts their data as a CSV file once a day on an FTP server.
Vendor 2 puts a zip compressed tab separated TSV file once an hour on an AWS S3 bucket.
Vendor 3 uses Qluster's UI to upload an Excel file of their inventory anytime they want.
Each one of these vendors can still log in to Qluster and upload their files manually if needed.

In this case, the inventory dataset has three data sources, one for each vendor. Each data source can have its own "rules".

For example, vendor 1 calls a column "brand". Vendor 2 calls the same column "manufacturer". Once the data is in the dataset, we want to have a "brand" column showing data from both vendors.


