Hadoop works best on a large data set. True or False?

0 votes
asked Oct 20, 2018 in Hadoop by admin (21,060 points)
Is it true Hadoop works best on a large data set? Why?

1 Answer

0 votes
answered Oct 20, 2018 by admin (21,060 points)
Hadoop Distributed File System (HDFS) are designed to handle very large files. The larger the file, the less time Hadoop spends seeking for the next data location on disk, the more time Hadoop runs at the limit of the bandwidth of your disks.

Seeks are generally expensive operations that are useful when they only need to analyze a small subset of your dataset. Since Hadoop is designed to run over your entire dataset, it is best to minimize seeks by using large files.
Most active Members
this month:
    Gute Mathe-Fragen - Bestes Mathe-Forum
    ...