issue with jdk files.list stream implementation

Jackie
6 min readDec 30, 2020

--

I was really confused by the output from the files list stream

16:25:29.032 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.031 [main] INFO StressTest - compute from parallel file with collection
16:25:29.032 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.032 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel file with collection
16:25:29.039 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [main] INFO StressTest - compute from parallel file with collection
16:25:29.039 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.039 [main] INFO StressTest - parallel compute for file names with collect
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel string with collection
16:25:29.042 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - parallel compute for file names
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.050 [main] INFO StressTest - compute from parallel string
16:25:29.050 [main] INFO StressTest - compute from parallel string
16:25:29.050 [main] INFO StressTest - compute from parallel string
16:25:29.050 [main] INFO StressTest - parallel compute for files
16:25:29.051 [main] INFO StressTest - compute from parallel file
16:25:29.051 [main] INFO StressTest - compute from parallel file
16:25:29.051 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
Process finished with exit code 0

corresponding to code

try {
Files.list(Paths.get("...."))
.parallel()
.filter(Files::isRegularFile)
.map(Path::toFile)
.filter(file -> file.getName().startsWith(Constants.AB_MODEL))
.collect(Collectors.toList())
.parallelStream()
.forEach(s -> {
log.info("compute from parallel file with collection");
});
} catch (IOException e) {
e.printStackTrace();
}
log.info("parallel compute for file names with collect");
try {
Files.list(Paths.get("...."))
.parallel()
.filter(Files::isRegularFile)
.map(Path::toFile)
.filter(file -> file.getName().startsWith(Constants.AB_MODEL))
.map(file -> file.getName())
.collect(Collectors.toList())
.parallelStream()
.forEach(s -> {
log.info("compute from parallel string with collection");
});
} catch (IOException e) {
e.printStackTrace();
}
log.info("parallel compute for file names");
try {
Files.list(Paths.get("...."))
.parallel()
.filter(Files::isRegularFile)
.map(Path::toFile)
.filter(file -> file.getName().startsWith(Constants.AB_MODEL))
.map(file -> file.getName())
.forEach(s -> {
log.info("compute from parallel string");
});
} catch (IOException e) {
e.printStackTrace();
}
log.info("parallel compute for files");
try {
Files.list(Paths.get("...."))
.parallel()
.filter(Files::isRegularFile)
.map(Path::toFile)
.filter(file -> file.getName().startsWith(Constants.AB_MODEL))
.forEach(s -> {
log.info("compute from parallel file");
});
} catch (IOException e) {
e.printStackTrace();
}

so the parallel from Files.list is resulting in a single thread to process all files (~50 files).

Unless there is a collect to do a parallel stream again, then it will split into the common pool.

After a lot of investigate and research, turns out JCP has a really not crafted implementations on parallel:

source: http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-July/034539.html

so basically when it’s doing splitIterator, it was using Long.MAX_VALUE:

IteratorSpliterator (est. MAX_VALUE elements)
| |
ArraySpliterator (est. 1024 elements) IteratorSpliterator (est. MAX_VALUE elements)
| |
/---------------/ |
| |
ArraySpliterator (est. 2048 elements) IteratorSpliterator (est. MAX_VALUE elements)
| |
/---------------/ |
| |
ArraySpliterator (est. 3072 elements) IteratorSpliterator (est. MAX_VALUE elements)
| |
/---------------/ |
| |
ArraySpliterator (est. 856 elements) IteratorSpliterator (est. MAX_VALUE elements)
|
(split returns null: refuses to split anymore)

source: https://stackoverflow.com/questions/34341656/why-is-files-list-parallel-stream-performing-so-much-slower-than-using-collect

code: https://github.com/openjdk/jdk/blob/d234388042e00f6933b4d723614ef42ec41e0c25/src/java.base/share/classes/java/util/Spliterators.java#L1784

do { a[j] = i.next(); } while (++j < n && i.hasNext());

--

--

No responses yet