Description
In this course, you will learn:
- Accessing memory and processing power
- Visualizing high-volume data
- Profiling and optimizing R code
- Compiling R functions
- Parallel processing with R
- Using R with other big data solutions
Syllabus:
- Introduction
- Wrangling high-volume data with R
- Sample data set
1. Problems and Opportunities with High-Volume Data
- Perspectives on high-volume data
- Big data and available memory
- Code: Finding available memory
- Big data and CPU cycles
- Code: How fast is your computer?
2. Visualizing High-Volume Data
- High-volume data and visualizations
- Code: Graphs for high-volume data
- Code: rug() and jitter()
- Code: Applying statistics to plots
- Code: Subsampled graphs for high-volume data
- Code: Trellising data across multiple charts
3. Working within the R Programming Language
- R programming tools for high-volume data
- Downsampling
- Profile R code to find inefficiencies
- Code: Profile R code to find inefficiencies
- Avoid the copy-on-modify problem with R
- Code: Avoid copy-on-modify with data.table
- Optimization versus readability
4. Advanced High-Volume Techniques
- Compile R functions
- Parallel processing with R
- Code: Parallel R functions
- Bigmemory, LaF, and ff packages
5. Use R with External Big Data Solutions
- Store high-volume data in a database
- Code: R with databases
- Cloud computing with R
- Sparklyr with R
- Code: R with Sparklyr
- hcards
- Improving your reading comprehension
- Reading efficiently and critically
- Sentence completion technique
- Understanding passage-based reading questions
- Passage-based reading technique
- Tips for recognizing wrong answers