Input data is partitioned in such a way that it can be distributed among a cluster of machines for processing in parallel.
Parallel Tasks - Used when one needs to apply different operations, each with its own data input.
并行任务(Parallel Tasks)——在需要执行每个都具有自己数据输入的不同操作的时候使用。
Loading time also depended on whether data were written to intermediate files or not, and on whether input files were processed in sequence or in parallel.