AWS Glue DataBrew currently is generally available in AWS’ US East (north Virginia), US East (Ohio), US West (Oregon), EU (Ireland), EU (Frankfurt), Asia Pacific (Sydney) and Asia Pacific (Tokyo) cloud regions.
Users can access and visually explore any amount of data directly from their Amazon Simple Storage Service (S3) data lake, Amazon Redshift data warehouse, and Amazon Aurora and Amazon Relational Database Service (RDS) databases.
The 250 built-in functions to combine and transpose the data include filtering anomalies, standardizing formats, generating aggregates for analyses, and correcting invalid, misclassified or duplicative data. Some of the prebuilt transformations use advanced ML techniques such as natural language processing.
“Once your data is ready, you can immediately use it with AWS and third-party services to gain further insights, such as Amazon SageMaker for machine learning, Amazon Redshift and Amazon Athena for analytics, and Amazon QuickSight and Tableau for business intelligence,” Danilo Poccia, AWS “chief evangelist” for Europe, the Middle East and Africa, said in a blog post.
Users also can then save these cleaning and normalization steps into a workflow -- called a “recipe” -- and apply them automatically to future incoming data.
“At any point in time, you can visually track and explore how datasets are linked to projects, recipes and job runs,” Poccia said. “In this way, you can understand how data flows and what are the changes. This information is called ‘data lineage’ and can help you find the root cause in case of errors in your output.”