Data Lake Notes

three main use cases for metadata repositories:

  • Finding data assets For example, a data architect may want to know which tables in which databases contain a Customer_ID.

  • Tracking lineage (provenance) Many regulations require enterprises to document the lineage of data assets—in other words, where the data for those assets came from and how it was generated or transformed.

  • Impact analysis If developers are making changes in a complex ecosystem, there is always a dan‐ ger of breaking something. Impact analysis allows developers to see all the data assets that rely on a particular field or integration job before making a change

Data governance

tools record, document, and sometimes manage governance poli‐cies. The tools usually define who the data steward is for each data asset. Data stew‐ ards are responsible for making sure the data assets are correct, documenting their purpose and lineage, and defining access and lifecycle management policies for them