How to optimize a directory scanning?

Hi, I have asked this in Stackoverflow without getting an answer. Wondering if people here could have some thoughts. I have a function reading the content of /proc every second. Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same logic in C or Rust just takes like 1% or 2%. Wondering if this can be improved. /proc is virtual filesystem, so this is not related to HDD performance. And I noticed this difference because my CPU is too old (Core Gen2). On modern CPU, as tested by others, the difference is barely noticeable. import Control.Exception import Control.Concurrent import Control.Monad import Data.Char import Data.Maybe import System.Directory import System.FilePath import System.Posix.Files import System.Posix.Signals import System.Posix.Types import System.Posix.User import System.IO.Strict as Strict watch u limit0s limit0h = do listDirectory "/proc/" >>= mapM_ (\fp -> do isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u) wrap2Maybe (Strict.readFile ("/proc/" > fp > "stat"))) threadDelay 1000000 watch u limit0s limit0h where wrap2Maybe :: IO a -> IO (Maybe a) wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) -> return Nothing) isMyPid :: FilePath -> UserID -> IO Bool isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me) -- 竹密岂妨流水过 山高哪阻野云飞

Pure speculation: are you paying for a lot of conversions between FilePath (string) and C strings? On Thu, May 9, 2019, 10:02 PM Magicloud Magiclouds < magicloud.magiclouds@gmail.com> wrote:
Hi, I have asked this in Stackoverflow without getting an answer. Wondering if people here could have some thoughts.
I have a function reading the content of /proc every second. Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same logic in C or Rust just takes like 1% or 2%. Wondering if this can be improved. /proc is virtual filesystem, so this is not related to HDD performance. And I noticed this difference because my CPU is too old (Core Gen2). On modern CPU, as tested by others, the difference is barely noticeable.
import Control.Exception import Control.Concurrent import Control.Monad import Data.Char import Data.Maybe import System.Directory import System.FilePath import System.Posix.Files import System.Posix.Signals import System.Posix.Types import System.Posix.User import System.IO.Strict as Strict
watch u limit0s limit0h = do listDirectory "/proc/" >>= mapM_ (\fp -> do isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u) wrap2Maybe (Strict.readFile ("/proc/" > fp > "stat"))) threadDelay 1000000 watch u limit0s limit0h where wrap2Maybe :: IO a -> IO (Maybe a) wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) -> return Nothing) isMyPid :: FilePath -> UserID -> IO Bool isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me)
-- 竹密岂妨流水过 山高哪阻野云飞 _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

I could not tell, since those are some kind of "standard" functions of
Haskell, right?
On Fri, May 10, 2019 at 10:11 AM David Feuer
Pure speculation: are you paying for a lot of conversions between FilePath (string) and C strings?
On Thu, May 9, 2019, 10:02 PM Magicloud Magiclouds
wrote: Hi, I have asked this in Stackoverflow without getting an answer. Wondering if people here could have some thoughts.
I have a function reading the content of /proc every second. Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same logic in C or Rust just takes like 1% or 2%. Wondering if this can be improved. /proc is virtual filesystem, so this is not related to HDD performance. And I noticed this difference because my CPU is too old (Core Gen2). On modern CPU, as tested by others, the difference is barely noticeable.
import Control.Exception import Control.Concurrent import Control.Monad import Data.Char import Data.Maybe import System.Directory import System.FilePath import System.Posix.Files import System.Posix.Signals import System.Posix.Types import System.Posix.User import System.IO.Strict as Strict
watch u limit0s limit0h = do listDirectory "/proc/" >>= mapM_ (\fp -> do isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u) wrap2Maybe (Strict.readFile ("/proc/" > fp > "stat"))) threadDelay 1000000 watch u limit0s limit0h where wrap2Maybe :: IO a -> IO (Maybe a) wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) -> return Nothing) isMyPid :: FilePath -> UserID -> IO Bool isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me)
-- 竹密岂妨流水过 山高哪阻野云飞 _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- 竹密岂妨流水过 山高哪阻野云飞 And for G+, please use magiclouds#gmail.com.

...what? Also, in C you'd stat() and check for -1 (not found_ or inspect the result to see if it's what you want. But in Haskell this throws an exception instead of producing a sane Either. so you either make multiple syscalls or you have to catch an exception. So no matter what this ends up being higher overhead than C or Rust. On Thu, May 9, 2019 at 10:15 PM Magicloud Magiclouds < magicloud.magiclouds@gmail.com> wrote:
I could not tell, since those are some kind of "standard" functions of Haskell, right?
On Fri, May 10, 2019 at 10:11 AM David Feuer
wrote: Pure speculation: are you paying for a lot of conversions between
FilePath (string) and C strings?
On Thu, May 9, 2019, 10:02 PM Magicloud Magiclouds <
magicloud.magiclouds@gmail.com> wrote:
Hi, I have asked this in Stackoverflow without getting an answer. Wondering if people here could have some thoughts.
I have a function reading the content of /proc every second. Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same logic in C or Rust just takes like 1% or 2%. Wondering if this can be improved. /proc is virtual filesystem, so this is not related to HDD performance. And I noticed this difference because my CPU is too old (Core Gen2). On modern CPU, as tested by others, the difference is barely noticeable.
import Control.Exception import Control.Concurrent import Control.Monad import Data.Char import Data.Maybe import System.Directory import System.FilePath import System.Posix.Files import System.Posix.Signals import System.Posix.Types import System.Posix.User import System.IO.Strict as Strict
watch u limit0s limit0h = do listDirectory "/proc/" >>= mapM_ (\fp -> do isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u) wrap2Maybe (Strict.readFile ("/proc/" > fp > "stat"))) threadDelay 1000000 watch u limit0s limit0h where wrap2Maybe :: IO a -> IO (Maybe a) wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) -> return Nothing) isMyPid :: FilePath -> UserID -> IO Bool isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me)
-- 竹密岂妨流水过 山高哪阻野云飞 _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- 竹密岂妨流水过 山高哪阻野云飞
And for G+, please use magiclouds#gmail.com. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh allbery.b@gmail.com

Make sense. I can see those "tricks" of C. But just, since this code
is not some complex computing, really wishing it could be speeded up.
For example, Rust gives Either on IO errors.
On Fri, May 10, 2019 at 10:17 AM Brandon Allbery
...what?
Also, in C you'd stat() and check for -1 (not found_ or inspect the result to see if it's what you want. But in Haskell this throws an exception instead of producing a sane Either. so you either make multiple syscalls or you have to catch an exception. So no matter what this ends up being higher overhead than C or Rust.
On Thu, May 9, 2019 at 10:15 PM Magicloud Magiclouds
wrote: I could not tell, since those are some kind of "standard" functions of Haskell, right?
On Fri, May 10, 2019 at 10:11 AM David Feuer
wrote: Pure speculation: are you paying for a lot of conversions between FilePath (string) and C strings?
On Thu, May 9, 2019, 10:02 PM Magicloud Magiclouds
wrote: Hi, I have asked this in Stackoverflow without getting an answer. Wondering if people here could have some thoughts.
I have a function reading the content of /proc every second. Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same logic in C or Rust just takes like 1% or 2%. Wondering if this can be improved. /proc is virtual filesystem, so this is not related to HDD performance. And I noticed this difference because my CPU is too old (Core Gen2). On modern CPU, as tested by others, the difference is barely noticeable.
import Control.Exception import Control.Concurrent import Control.Monad import Data.Char import Data.Maybe import System.Directory import System.FilePath import System.Posix.Files import System.Posix.Signals import System.Posix.Types import System.Posix.User import System.IO.Strict as Strict
watch u limit0s limit0h = do listDirectory "/proc/" >>= mapM_ (\fp -> do isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u) wrap2Maybe (Strict.readFile ("/proc/" > fp > "stat"))) threadDelay 1000000 watch u limit0s limit0h where wrap2Maybe :: IO a -> IO (Maybe a) wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) -> return Nothing) isMyPid :: FilePath -> UserID -> IO Bool isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me)
-- 竹密岂妨流水过 山高哪阻野云飞 _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- 竹密岂妨流水过 山高哪阻野云飞
And for G+, please use magiclouds#gmail.com. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh allbery.b@gmail.com
-- 竹密岂妨流水过 山高哪阻野云飞 And for G+, please use magiclouds#gmail.com.

Would you happen to have the Rust/C code available? One option is to simply using the C code and bind to it. The one thing that stands out to me in your code is that you call doesDirectoryExist as well as getFileStatus when you could determine whether it exists with doesPathExist and then determine whether it's a directory by checking the result of getFileStatus Cheers, Vanessa McHale On 5/9/19 9:00 PM, Magicloud Magiclouds wrote:
Hi, I have asked this in Stackoverflow without getting an answer. Wondering if people here could have some thoughts.
I have a function reading the content of /proc every second. Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same logic in C or Rust just takes like 1% or 2%. Wondering if this can be improved. /proc is virtual filesystem, so this is not related to HDD performance. And I noticed this difference because my CPU is too old (Core Gen2). On modern CPU, as tested by others, the difference is barely noticeable.
import Control.Exception import Control.Concurrent import Control.Monad import Data.Char import Data.Maybe import System.Directory import System.FilePath import System.Posix.Files import System.Posix.Signals import System.Posix.Types import System.Posix.User import System.IO.Strict as Strict
watch u limit0s limit0h = do listDirectory "/proc/" >>= mapM_ (\fp -> do isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u) wrap2Maybe (Strict.readFile ("/proc/" > fp > "stat"))) threadDelay 1000000 watch u limit0s limit0h where wrap2Maybe :: IO a -> IO (Maybe a) wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) -> return Nothing) isMyPid :: FilePath -> UserID -> IO Bool isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me)

Yes, I do. But binding might not be the top priority since I'd like to
leave logic in Haskell.
Tried the dir checking, could not see major differences.
On Fri, May 10, 2019 at 10:13 AM Vanessa McHale
Would you happen to have the Rust/C code available?
One option is to simply using the C code and bind to it.
The one thing that stands out to me in your code is that you call
doesDirectoryExist
as well as
getFileStatus
when you could determine whether it exists with
doesPathExist
and then determine whether it's a directory by checking the result of getFileStatus
Cheers, Vanessa McHale
On 5/9/19 9:00 PM, Magicloud Magiclouds wrote:
Hi, I have asked this in Stackoverflow without getting an answer. Wondering if people here could have some thoughts.
I have a function reading the content of /proc every second. Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same logic in C or Rust just takes like 1% or 2%. Wondering if this can be improved. /proc is virtual filesystem, so this is not related to HDD performance. And I noticed this difference because my CPU is too old (Core Gen2). On modern CPU, as tested by others, the difference is barely noticeable.
import Control.Exception import Control.Concurrent import Control.Monad import Data.Char import Data.Maybe import System.Directory import System.FilePath import System.Posix.Files import System.Posix.Signals import System.Posix.Types import System.Posix.User import System.IO.Strict as Strict
watch u limit0s limit0h = do listDirectory "/proc/" >>= mapM_ (\fp -> do isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u) wrap2Maybe (Strict.readFile ("/proc/" > fp > "stat"))) threadDelay 1000000 watch u limit0s limit0h where wrap2Maybe :: IO a -> IO (Maybe a) wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) -> return Nothing) isMyPid :: FilePath -> UserID -> IO Bool isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me)
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- 竹密岂妨流水过 山高哪阻野云飞 And for G+, please use magiclouds#gmail.com.

On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:
Hi, I have asked this in Stackoverflow without getting an answer. Wondering if people here could have some thoughts.
I have a function reading the content of /proc every second. Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same logic in C or Rust just takes like 1% or 2%. Wondering if this can be improved. /proc is virtual filesystem, so this is not related to HDD performance. And I noticed this difference because my CPU is too old (Core Gen2). On modern CPU, as tested by others, the difference is barely noticeable.
watch u limit0s limit0h = do listDirectory "/proc/" >>= mapM_ (\fp -> do isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u) wrap2Maybe (Strict.readFile ("/proc/" > fp > "stat"))) threadDelay 1000000 watch u limit0s limit0h where wrap2Maybe :: IO a -> IO (Maybe a) wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) -> return Nothing) isMyPid :: FilePath -> UserID -> IO Bool isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me)
Interesting, I can see a few potential issues. But first, have you measure how many syscalls does this do in Haskell vs. C vs Rust? That would allow you to separate the problem between internal Haskell problems (e.g. String) vs. different algorithm in Haskell. For exacmple, one issue that could lead to unneded syscalls is your "isMyPid" function. AFAIK there's no caching done by getFileStatus, so you're stat'ing (and making a syscall) each path twice, once to get file type (is it directory) information, and then a second time to get owner information. You also build `"/proc/" <> fp` twice (and thus evaluate it twice). But without understanding "how" Haskell it slower, it's not clear where the problem lies (in syscalls or in GC or …). regards, iustin

Good point. Let me see what strace can tell me.
On Fri, May 10, 2019 at 3:46 PM Iustin Pop
On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:
Hi, I have asked this in Stackoverflow without getting an answer. Wondering if people here could have some thoughts.
I have a function reading the content of /proc every second. Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same logic in C or Rust just takes like 1% or 2%. Wondering if this can be improved. /proc is virtual filesystem, so this is not related to HDD performance. And I noticed this difference because my CPU is too old (Core Gen2). On modern CPU, as tested by others, the difference is barely noticeable.
watch u limit0s limit0h = do listDirectory "/proc/" >>= mapM_ (\fp -> do isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u) wrap2Maybe (Strict.readFile ("/proc/" > fp > "stat"))) threadDelay 1000000 watch u limit0s limit0h where wrap2Maybe :: IO a -> IO (Maybe a) wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) -> return Nothing) isMyPid :: FilePath -> UserID -> IO Bool isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me)
Interesting, I can see a few potential issues. But first, have you measure how many syscalls does this do in Haskell vs. C vs Rust? That would allow you to separate the problem between internal Haskell problems (e.g. String) vs. different algorithm in Haskell.
For exacmple, one issue that could lead to unneded syscalls is your "isMyPid" function. AFAIK there's no caching done by getFileStatus, so you're stat'ing (and making a syscall) each path twice, once to get file type (is it directory) information, and then a second time to get owner information.
You also build `"/proc/" <> fp` twice (and thus evaluate it twice).
But without understanding "how" Haskell it slower, it's not clear where the problem lies (in syscalls or in GC or …).
regards, iustin
-- 竹密岂妨流水过 山高哪阻野云飞 And for G+, please use magiclouds#gmail.com.

So this is what I got. Seems like both calls two stat(stat/newfstatat)
for dir checking and uid checking. But when open file for reading,
there is an ioctl call (maybe from System.IO.Strict) which seems
failed, for Haskell. I want to test the case without System.IO.Strict.
But have no idea how to get exception catching works with lazy
readFIle.
For Haskell implenmentation,
```
stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc/230/stat", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 23
fstat(23, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
ioctl(23, TCGETS, 0x7ffe88c18090) = -1 ENOTTY (Inappropriate
ioctl for device)
read(23, "230 (scsi_eh_5) S 2 0 0 0 -1 212"..., 8192) = 155
read(23, "", 8192) = 0
close(23)
```
For Rust implenmentation,
```
newfstatat(3, "1121", {st_mode=S_IFDIR|0555, st_size=0, ...},
AT_SYMLINK_NOFOLLOW) = 0
stat("/proc/1121", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/1121/stat", O_RDONLY|O_CLOEXEC) = 4
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "1121 (ibus-engine-sim) S 1077 10", 32) = 32
read(4, "77 1077 0 -1 4194304 425 0 1 0 5", 32) = 32
read(4, "866 1689 0 0 20 0 3 0 3490 16454"..., 64) = 64
read(4, "4885264596992 94885264603013 140"..., 128) = 128
read(4, "0724521155542 140724521155575 14"..., 256) = 64
read(4, "", 192) = 0
close(4)
```
On Fri, May 10, 2019 at 3:49 PM Magicloud Magiclouds
Good point. Let me see what strace can tell me.
On Fri, May 10, 2019 at 3:46 PM Iustin Pop
wrote: On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:
Hi, I have asked this in Stackoverflow without getting an answer. Wondering if people here could have some thoughts.
I have a function reading the content of /proc every second. Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same logic in C or Rust just takes like 1% or 2%. Wondering if this can be improved. /proc is virtual filesystem, so this is not related to HDD performance. And I noticed this difference because my CPU is too old (Core Gen2). On modern CPU, as tested by others, the difference is barely noticeable.
watch u limit0s limit0h = do listDirectory "/proc/" >>= mapM_ (\fp -> do isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u) wrap2Maybe (Strict.readFile ("/proc/" > fp > "stat"))) threadDelay 1000000 watch u limit0s limit0h where wrap2Maybe :: IO a -> IO (Maybe a) wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) -> return Nothing) isMyPid :: FilePath -> UserID -> IO Bool isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me)
Interesting, I can see a few potential issues. But first, have you measure how many syscalls does this do in Haskell vs. C vs Rust? That would allow you to separate the problem between internal Haskell problems (e.g. String) vs. different algorithm in Haskell.
For exacmple, one issue that could lead to unneded syscalls is your "isMyPid" function. AFAIK there's no caching done by getFileStatus, so you're stat'ing (and making a syscall) each path twice, once to get file type (is it directory) information, and then a second time to get owner information.
You also build `"/proc/" <> fp` twice (and thus evaluate it twice).
But without understanding "how" Haskell it slower, it's not clear where the problem lies (in syscalls or in GC or …).
regards, iustin
-- 竹密岂妨流水过 山高哪阻野云飞
And for G+, please use magiclouds#gmail.com.
-- 竹密岂妨流水过 山高哪阻野云飞 And for G+, please use magiclouds#gmail.com.

The ioctl is standard, including in C unless you are using open() directly: it checks to see if the opened file is a terminal, to determine whether to set block or line buffering. On Fri, May 10, 2019 at 11:09 AM Magicloud Magiclouds < magicloud.magiclouds@gmail.com> wrote:
So this is what I got. Seems like both calls two stat(stat/newfstatat) for dir checking and uid checking. But when open file for reading, there is an ioctl call (maybe from System.IO.Strict) which seems failed, for Haskell. I want to test the case without System.IO.Strict. But have no idea how to get exception catching works with lazy readFIle.
For Haskell implenmentation, ``` stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 openat(AT_FDCWD, "/proc/230/stat", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 23 fstat(23, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 ioctl(23, TCGETS, 0x7ffe88c18090) = -1 ENOTTY (Inappropriate ioctl for device) read(23, "230 (scsi_eh_5) S 2 0 0 0 -1 212"..., 8192) = 155 read(23, "", 8192) = 0 close(23) ``` For Rust implenmentation, ``` newfstatat(3, "1121", {st_mode=S_IFDIR|0555, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 stat("/proc/1121", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 open("/proc/1121/stat", O_RDONLY|O_CLOEXEC) = 4 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 read(4, "1121 (ibus-engine-sim) S 1077 10", 32) = 32 read(4, "77 1077 0 -1 4194304 425 0 1 0 5", 32) = 32 read(4, "866 1689 0 0 20 0 3 0 3490 16454"..., 64) = 64 read(4, "4885264596992 94885264603013 140"..., 128) = 128 read(4, "0724521155542 140724521155575 14"..., 256) = 64 read(4, "", 192) = 0 close(4) ```
On Fri, May 10, 2019 at 3:49 PM Magicloud Magiclouds
wrote: Good point. Let me see what strace can tell me.
On Fri, May 10, 2019 at 3:46 PM Iustin Pop
wrote: On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:
Hi, I have asked this in Stackoverflow without getting an answer. Wondering if people here could have some thoughts.
I have a function reading the content of /proc every second. Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same logic in C or Rust just takes like 1% or 2%. Wondering if this can be improved. /proc is virtual filesystem, so this is not related to HDD performance. And I noticed this difference because my CPU is too old (Core Gen2). On modern CPU, as tested by others, the difference is barely noticeable.
watch u limit0s limit0h = do listDirectory "/proc/" >>= mapM_ (\fp -> do isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u) wrap2Maybe (Strict.readFile ("/proc/" > fp > "stat"))) threadDelay 1000000 watch u limit0s limit0h where wrap2Maybe :: IO a -> IO (Maybe a) wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) -> return Nothing) isMyPid :: FilePath -> UserID -> IO Bool isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me)
Interesting, I can see a few potential issues. But first, have you measure how many syscalls does this do in Haskell vs. C vs Rust? That would allow you to separate the problem between internal Haskell problems (e.g. String) vs. different algorithm in Haskell.
For exacmple, one issue that could lead to unneded syscalls is your "isMyPid" function. AFAIK there's no caching done by getFileStatus, so you're stat'ing (and making a syscall) each path twice, once to get
file
type (is it directory) information, and then a second time to get owner information.
You also build `"/proc/" <> fp` twice (and thus evaluate it twice).
But without understanding "how" Haskell it slower, it's not clear where the problem lies (in syscalls or in GC or …).
regards, iustin
-- 竹密岂妨流水过 山高哪阻野云飞
And for G+, please use magiclouds#gmail.com.
-- 竹密岂妨流水过 山高哪阻野云飞
And for G+, please use magiclouds#gmail.com. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh allbery.b@gmail.com

It would be possible to avoid that TCGETS ioctl since the immediately preceding fstat shows that the file is a regular file and not a device. However, I'm not sure how easy it would be for the library to make that optimization. The Haskell implementation actually makes less syscalls than the Rust one, because Rust reads the file in very small chunks (32,32,64,128,64) whereas Haskell reads one big chunk (8192) which is sufficient to contain the entire file. I think it's unlikely that the extra ioctl outweighs the multiple extra reads. However, if you use the -r option with strace to include timestamps in the output, you'll be able to see just how long each syscall is taking. On my system, they all take about the same amount of time. It would also be worth using time on the program, to see how much of the CPU time is in user space vs kernel. On 2019-05-10 9:35 AM, Brandon Allbery wrote:
The ioctl is standard, including in C unless you are using open() directly: it checks to see if the opened file is a terminal, to determine whether to set block or line buffering.
On Fri, May 10, 2019 at 11:09 AM Magicloud Magiclouds
mailto:magicloud.magiclouds@gmail.com> wrote: So this is what I got. Seems like both calls two stat(stat/newfstatat) for dir checking and uid checking. But when open file for reading, there is an ioctl call (maybe from System.IO.Strict) which seems failed, for Haskell. I want to test the case without System.IO.Strict. But have no idea how to get exception catching works with lazy readFIle.
For Haskell implenmentation, ``` stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 openat(AT_FDCWD, "/proc/230/stat", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 23 fstat(23, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 ioctl(23, TCGETS, 0x7ffe88c18090) = -1 ENOTTY (Inappropriate ioctl for device) read(23, "230 (scsi_eh_5) S 2 0 0 0 -1 212"..., 8192) = 155 read(23, "", 8192) = 0 close(23) ``` For Rust implenmentation, ``` newfstatat(3, "1121", {st_mode=S_IFDIR|0555, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 stat("/proc/1121", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 open("/proc/1121/stat", O_RDONLY|O_CLOEXEC) = 4 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 read(4, "1121 (ibus-engine-sim) S 1077 10", 32) = 32 read(4, "77 1077 0 -1 4194304 425 0 1 0 5", 32) = 32 read(4, "866 1689 0 0 20 0 3 0 3490 16454"..., 64) = 64 read(4, "4885264596992 94885264603013 140"..., 128) = 128 read(4, "0724521155542 140724521155575 14"..., 256) = 64 read(4, "", 192) = 0 close(4) ```

Why is the process id re-computed every second? Do you expected it to change during the process lifetime?
isMyPid fp me = do let areDigit = fp >= "0" && fp <= "9" isDir <- doesDirectoryExist $ "/proc/" > fp owner <- fileOwner <$> getFileStatus ("/proc" > fp) return $ areDigit && isDir && (owner == me)
And the code should skip looking for sub-directories of non-numeric directory entries, avoiding unnecessary stat(2) calls. import System.Posix.Directory as D import Control.Monad perEntry_ :: FilePath -> (FilePath -> IO ()) -> IO () perEntry_ dirPath entryAction = bracket (D.openDirStream) (D.closeDirStream) (D.readDirStream >=> entryAction) Or with Conduits: import Data.Conduit as C import Data.Conduit.Combinators as C C.runConduitRes $ C.sourceDirectory dirPath .| (C.awaitForever >>= entryAction) But now you have more choices about when and what to return from the loop, whether the scan the whole directory, ... Note that the conduit version prepends the directory name to the entry names. I would not have done that, but you can just copy the handful of lines of source and stream the bare entry names: http://hackage.haskell.org/package/conduit-1.3.1.1/docs/src/Data.Conduit.Com... -- Viktor.

Hi, we made the `posix-paths` package for fast directory traversals: https://hackage.haskell.org/package/posix-paths You can find benchmarks in https://github.com/JohnLato/posix-paths#benchmarks Some more tips (some of them you're already following as per other threads): * Use `time` to if time is spent on kernel CPU, userspace CPU, or waiting * Use `strace -fy` with `-ttt` and `-T` to see timings, and `-c` and `-wc` summary statistics

Thank you for making the `posix-paths` package for fast directory
traversals:
Are directories stored in consecutive disk blocks?
On Fri, May 10, 2019 at 6:53 PM Niklas Hambüchen
Hi,
we made the `posix-paths` package for fast directory traversals:
https://hackage.haskell.org/package/posix-paths
You can find benchmarks in
https://github.com/JohnLato/posix-paths#benchmarks
Some more tips (some of them you're already following as per other threads):
* Use `time` to if time is spent on kernel CPU, userspace CPU, or waiting * Use `strace -fy` with `-ttt` and `-T` to see timings, and `-c` and `-wc` summary statistics _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- -- Sent from an expensive device which will be obsolete in a few months! :D Casey

Depends on the host filesystem. Traditionally, the first 10 blocks are
direct and often (but not always, if the fs is fragmented) consecutive; the
remainder are indirect by 1-3 levels (not that you ever want a directory to
be double indirect much less triple!), and often are not consecutive simply
because by the time you get to that point you're working with a filesystem
with a lot of files on it and a fair amount of fragmentation.
On Sat, May 11, 2019 at 6:24 PM KC
Thank you for making the `posix-paths` package for fast directory traversals:
Are directories stored in consecutive disk blocks?
On Fri, May 10, 2019 at 6:53 PM Niklas Hambüchen
wrote: Hi,
we made the `posix-paths` package for fast directory traversals:
https://hackage.haskell.org/package/posix-paths
You can find benchmarks in
https://github.com/JohnLato/posix-paths#benchmarks
Some more tips (some of them you're already following as per other threads):
* Use `time` to if time is spent on kernel CPU, userspace CPU, or waiting * Use `strace -fy` with `-ttt` and `-T` to see timings, and `-c` and `-wc` summary statistics _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
--
--
Sent from an expensive device which will be obsolete in a few months! :D Casey
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh allbery.b@gmail.com

Am 12.05.19 um 00:23 schrieb KC:
Are directories stored in consecutive disk blocks?
That's something that you have to rely on the file system to organize for you. Brandon's answer is the traditional one for Unix filesystems, up to and including ext3fs. Modern filesystems try to do better (and often do), since scanning large directories has turned out to be so important. If you do performance testing, both bad and good filesystem performance may be accidental; if you want to know not just the typical behaviour but also the pathological cases, you'll either have to wait for user reports to come in or talk to real filesystem experts (and even their answers will mostly be on an "it depends" basis). Note that fragmentation is irrelevant for SSDs. The OP is at the "what system calls are being done" stage; optimization questions about fragmentation aren't going to be relevant to him I think. TL;DR: Don't worry about fragmentation, unless you are willing to spend a really high amount of time on detail optimization. Regards, Jo

Thanks for all replies. I did not track forks with strace since "I do
not have such code" although stack has threaded and rtsopts set. But
now `strace -f` clearly shows that there are qutie a lot of forking
for my code. Removing those options got me a 3% CPU usage reducing.
And as Neil said, ioctl or other syscalls in the whole reading
process, Haskell is more optimized than Rust.
I am trying posix-paths now.
@Viktor,
Sorry, that was a part missing in sample code. isMyPid should be
called before reading the stat file.
@Brandon, @Joachim, @KC,
At least for me, how data is stored on disk is not related. /proc is a
virtual filesystem which just a kernel data structures exposed via IO
operations.
On Sun, May 12, 2019 at 2:27 PM Joachim Durchholz
Am 12.05.19 um 00:23 schrieb KC:
Are directories stored in consecutive disk blocks?
That's something that you have to rely on the file system to organize for you. Brandon's answer is the traditional one for Unix filesystems, up to and including ext3fs. Modern filesystems try to do better (and often do), since scanning large directories has turned out to be so important. If you do performance testing, both bad and good filesystem performance may be accidental; if you want to know not just the typical behaviour but also the pathological cases, you'll either have to wait for user reports to come in or talk to real filesystem experts (and even their answers will mostly be on an "it depends" basis). Note that fragmentation is irrelevant for SSDs.
The OP is at the "what system calls are being done" stage; optimization questions about fragmentation aren't going to be relevant to him I think.
TL;DR: Don't worry about fragmentation, unless you are willing to spend a really high amount of time on detail optimization.
Regards, Jo _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- 竹密岂妨流水过 山高哪阻野云飞 And for G+, please use magiclouds#gmail.com.

On Sat, May 11, 2019 at 03:52:38AM +0200, Niklas Hambüchen wrote:
we made the `posix-paths` package for fast directory traversals:
https://hackage.haskell.org/package/posix-paths
You can find benchmarks in
It should perhaps be noted that a large fraction of the additional overhead encountered by the String FilePath traversals in the that benchmark occur in the output code that prints all the paths to stdout. The corresponding ByteString listing is noticeably faster. If one rather just stats and counts all the files, the performance difference is somewhat more modest, (IIRC around a factor of ~2 rather than ~5 or 6) At the directory traversal of course needs to use 'getSymbolicLinkStatus' rather than 'getFileStatus', since recursive directory traversals should almost never follow symlinks. -- Viktor.
participants (10)
-
Brandon Allbery
-
David Feuer
-
Iustin Pop
-
Joachim Durchholz
-
KC
-
Magicloud Magiclouds
-
Neil Mayhew
-
Niklas Hambüchen
-
Vanessa McHale
-
Viktor Dukhovni