Test-Driven Data Analysis by Nicholas J. Radcliffe (.PDF)
File Size: 31.3 MB
Test-Driven Data Analysis (Data Science Series) by Nicholas J. Radcliffe
Requirements: .PDF reader, 31.3 MB | True PDF
Overview: Test-driven data analysis is the synthesis of ideas from test-driven development of software to data-intensive work including Data Science, data analysis, and data engineering. It is a methodology for improving the quality of data and of analytical pipelines and processes. It can be thought of as data analysis as if the answers actually matter. Test-driven data analysis can be thought of as a sibling to reproducible research, with similar concerns, but greater emphasis on automated testing, and less requirement for a human to reproduce results. Extensive checklists are provided that can be used to improve quality before,during, and after analysis. Test-driven data analysis (TDDA) is both a methodology for improving data quality and data pipelines, and a set of software tools—the Python tdda library—for helping to implement that methodology. The library is written in Python and presents a Python API, so is most relevant to people using Python, but it also provides a command-line interface. This interface is the main way that the data-quality functions are used, so users of any programming language (or none) should be able to use the software through its tdda command without difficulty. This covers almost all of Part I of the book. Even in the area of testing analytical pipelines, the subject of Part II there is command-line support for writing tests for programs and scripts in any language. Part III covers aspects of TDDA that do not have (and may not be suited to) software support. These should be relevant to users of any programming language and none. Throughout the book, I have also tried to keep a clear separation between the methodological parts of TDDA and the details of the specific support for them in the library, so that even people who do not directly work with data should be able to benefit from much of the book. I have also tended to signpost in the book sections that can safely be skipped, particularly by non-Python readers. This book is for anyone working with data who is interested in producing more robust and reliable results. One such group is hands-on practitioners—data scientists, Machine Learning researchers, data analysts, data engineers, technical quality assurance professionals, and others working with data at scale.
Genre: Non-Fiction > Tech & Devices

Free Download links: