
Hi cafe! I have a question of C-to-Haskell type:) Imagine web application wich allows users to browse some shared filesystem located at the server. Application stores every users's position within that filesystem (current directory or file). In C this can be implemented with the help of following data types: struct tree_node { union item { // some file data struct file *file; // struct dir has link to another list of tree_node struct dir *dir; }; int type; // List of tree_nodes struct tree_node *next; struct tree_node *prev; }; struct user { struct tree_node *position; // List of users struct user *next; struct user *prev; }; This implementation will give us 1) O(1) time to insert to shared tree 2) O(1) time to access user's current position Is it possible to reach this requirements in haskell? For example, managing distinct tree type like data TreeNode = File | Dir [TreeNode] will lead to failure of req. 2 (have to traverse this tree to find each user's position). Also one could manage several zipper types (one for every user): data TreeNodeCtx = Top | TreeNodeCtx { left :: [TreeNode], right :: [TreeNode], up :: TreeNodeCtx } data TreeNodeZ = TreeNodeZ { ctx :: [TreeNodeCtx] pos :: TreeNode } It works for one user but not for many because of req. 1 (have to insert new item into several zippers). Any ideas? -- Thanks, Sergey

2010/7/14 Sergey Mironov
Hi cafe! I have a question of C-to-Haskell type:)
Imagine web application wich allows users to browse some shared filesystem located at the server. Application stores every users's position within that filesystem (current directory or file).
In C this can be implemented with the help of following data types:
Any ideas?
Use IORef. ;) PS MVar is better, actually.

Or you can get the best of all worlds by combining all three! data User = User { userNext :: IORef (MVar (TVar User))) , userPrev :: IORef (MVar (TVar User))) } On 07/14/10 14:39, Andrew Coppin wrote:
Serguey Zefirov wrote:
Use IORef. ;)
PS MVar is better, actually
TVar is better still. ;-)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

2010/7/15 Serguey Zefirov
2010/7/14 Sergey Mironov
: Hi cafe! I have a question of C-to-Haskell type:)
Imagine web application wich allows users to browse some shared filesystem located at the server. Application stores every users's position within that filesystem (current directory or file).
In C this can be implemented with the help of following data types:
Any ideas?
Use IORef. ;)
PS MVar is better, actually.
Somehow I forgot about them:) Code will turn into something like data TreeNodeData = File | Dir (IORef TreeNode) data TreeNode = TreeNode { next :: Maybe (IORef TreeNode), prev :: Maybe (IORef TreeNode), up :: Maybe (IORef TreeNode), -- missed it in original C example payload :: TreeNodeData } data User = User { position :: IORef TreeNode, -- ... } It really should work! (we don't take multithreading issues into account for now) Slightly annoying thing is that 1-to-1 mapping from C to Haskell also forces programmer to perform C-like low-level pointer linking. -- Thanks, Sergey

2010/7/15 Sergey Mironov
2010/7/15 Serguey Zefirov
: 2010/7/14 Sergey Mironov
: Hi cafe! I have a question of C-to-Haskell type:)
Imagine web application wich allows users to browse some shared filesystem located at the server. Application stores every users's position within that filesystem (current directory or file).
In C this can be implemented with the help of following data types:
Any ideas?
Use IORef. ;)
PS MVar is better, actually. Somehow I forgot about them:) Code will turn into something like
It really should work! (we don't take multithreading issues into account for now) Slightly annoying thing is that 1-to-1 mapping from C to Haskell also forces programmer to perform C-like low-level pointer linking.
This is just straightforward solution and it contains some number of traps. What if someone disconnected a part of "file system" while some user browses it? That user will be trapped inside that island (or get a core dump), How do users get notifications about changes in their parts of structures? You can do better but, of course, it will be harder. The browsing itself is a simple variant of collaborative editing: http://en.wikipedia.org/wiki/Collaborative_editing Your variant is simpler that editing because only one member produce changes in structure. So you could send tree diffs in Zipper format to all your users or accumulate diffs and give them to users when they ask for it. Adding tree diff over user position described as Zipper won't put user into an isolated island. And if you later decide that there are two parties who can change the world, you are almost fully prepared for it.

You can implement pure pointers on top of Data.Map with O(log n) time: {-# LANGUAGE ExistentialQuantification #-} import Data.Map ( Map ) import qualified Data.Map as Map import Data.Typeable import Control.Monad.State import Data.Maybe type PointerSpace = Map Int PackedValue newtype Pointer a = Pointer Int data PackedValue = forall a. Typeable a => PackedValue a readPointer :: Pointer a -> State PointerSpace a readPointer ( Pointer key ) = do space <- get return $ fromJust $ cast $ Map.find key space writePointer :: a -> Pointer a -> State PointerSpace () writePointer a ( Pointer key ) = do space <- get put $ Map.insert key ( PackedValue a ) space newPointer :: a -> State PointerSpace ( Pointer a ) newPointer a = do space <- get let key = findEmptyKey space -- implement it yourself p = Pointer key writePointer a p return p Code can contain some typos. Sergey Mironov пишет:
Hi cafe! I have a question of C-to-Haskell type:)
Imagine web application wich allows users to browse some shared filesystem located at the server. Application stores every users's position within that filesystem (current directory or file).
In C this can be implemented with the help of following data types:
struct tree_node { union item { // some file data struct file *file;
// struct dir has link to another list of tree_node struct dir *dir; }; int type;
// List of tree_nodes struct tree_node *next; struct tree_node *prev; };
struct user { struct tree_node *position;
// List of users struct user *next; struct user *prev; };
This implementation will give us 1) O(1) time to insert to shared tree 2) O(1) time to access user's current position
Is it possible to reach this requirements in haskell?
For example, managing distinct tree type like
data TreeNode = File | Dir [TreeNode]
will lead to failure of req. 2 (have to traverse this tree to find each user's position).
Also one could manage several zipper types (one for every user):
data TreeNodeCtx = Top | TreeNodeCtx { left :: [TreeNode], right :: [TreeNode], up :: TreeNodeCtx }
data TreeNodeZ = TreeNodeZ { ctx :: [TreeNodeCtx] pos :: TreeNode }
It works for one user but not for many because of req. 1 (have to insert new item into several zippers).
Any ideas?

On Thu, Jul 15, 2010 at 4:30 AM, Stephen Tetley
2010/7/15 Jake McArthur
: On 07/14/2010 05:01 PM, Victor Gorokhov wrote:
You can implement pure pointers on top of Data.Map with O(log n) time
Or on top of Data.IntMap with O(1) time. ;)
Unlikely...
From the docs, lookup is O(min(n,W))
W is a constant, 32 or 64 on most machines, so this is really O(W) = O(1). Should someone create an IntegerMap, then lookup wouldn't be O(1) as W would be variable. Cheers! -- Felipe.

Stephen Tetley
2010/7/15 Jake McArthur
: On 07/14/2010 05:01 PM, Victor Gorokhov wrote:
You can implement pure pointers on top of Data.Map with O(log n) time
Or on top of Data.IntMap with O(1) time. ;)
Unlikely...
From the docs, lookup is O(min(n,W))
Yeah, I was trying to work out how the O(1) time worked as well... -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On 07/15/2010 02:30 AM, Stephen Tetley wrote:
2010/7/15 Jake McArthur
: On 07/14/2010 05:01 PM, Victor Gorokhov wrote:
You can implement pure pointers on top of Data.Map with O(log n) time
Or on top of Data.IntMap with O(1) time. ;)
Unlikely...
From the docs, lookup is O(min(n,W))
Exactly. O(min(n,32)) or O(min(n,64)) - Jake

15 июля 2010 г. 2:01 пользователь Victor Gorokhov
You can implement pure pointers on top of Data.Map with O(log n) time:
{-# LANGUAGE ExistentialQuantification #-} import Data.Map ( Map ) import qualified Data.Map as Map import Data.Typeable import Control.Monad.State import Data.Maybe
type PointerSpace = Map Int PackedValue newtype Pointer a = Pointer Int data PackedValue = forall a. Typeable a => PackedValue a
readPointer :: Pointer a -> State PointerSpace a readPointer ( Pointer key ) = do space <- get return $ fromJust $ cast $ Map.find key space
writePointer :: a -> Pointer a -> State PointerSpace () writePointer a ( Pointer key ) = do space <- get put $ Map.insert key ( PackedValue a ) space
newPointer :: a -> State PointerSpace ( Pointer a ) newPointer a = do space <- get let key = findEmptyKey space -- implement it yourself p = Pointer key writePointer a p return p
Thanks for an example! Probably, one can think about using Arrays instead of Map or IntMap in order to achieve 'true' O(1) in pure. But I suppose that there are some trouble with array expanding. Or somebody would already make it. -- Thanks, Sergey

Thanks for an example! Probably, one can think about using Arrays instead of Map or IntMap in order to achieve 'true' O(1) in pure. But I suppose that there are some trouble with array expanding. Or somebody would already make it.
Pure arrays have O(n) modification time.
From the docs, lookup is O(min(n,W))
Actually worse than O(log n). B-tree with 4 or even 8 child nodes will be the best solution. This trees have better lookup time and worse space efficiency, but we can almost eliminate space overhead by using dense keys.

On 07/15/2010 05:33 PM, Victor Gorokhov wrote:
Thanks for an example! Probably, one can think about using Arrays instead of Map or IntMap in order to achieve 'true' O(1) in pure. But I suppose that there are some trouble with array expanding. Or somebody would already make it.
Pure arrays have O(n) modification time.
Excluding DiffArray under certain usage patterns of course, but DiffArray is slow for unknown reasons besides algorithmic complexity.
From the docs, lookup is O(min(n,W))
Actually worse than O(log n).
Perhaps I am misunderstanding you, but O(min(n,W)) is either better than or the same as O(log n), depending on how you look at things, but I don't see any way that the former could be *worse* than the latter. - Jake

Jake McArthur wrote:
On 07/15/2010 05:33 PM, Victor Gorokhov wrote:
From the docs, lookup is O(min(n,W)) Actually worse than O(log n).
Perhaps I am misunderstanding you, but O(min(n,W)) is either better than or the same as O(log n), depending on how you look at things, but I don't see any way that the former could be *worse* than the latter.
For n < W: min(n,W) > log n So, when you can guarantee that n < W ---which is almost always the case for IntMap---, then O(min(n,W)) is linear and therefore worse than O(log n). But even so, if your constant factors are k << c, then k*n < c*log n is perfectly possible for all n < W, and therefore what matters in the real world here is the constant factors. The reason why is that for asymptotic purposes O(min(n,W)) and O(log n) belong to the same class of functions between constant and linear, so they're effectively the same (in asymptotic-land). -- Live well, ~wren

2010/7/16 wren ng thornton
Jake McArthur wrote:
On 07/15/2010 05:33 PM, Victor Gorokhov wrote:
From the docs, lookup is O(min(n,W))
Actually worse than O(log n).
Perhaps I am misunderstanding you, but O(min(n,W)) is either better than or the same as O(log n), depending on how you look at things, but I don't see any way that the former could be *worse* than the latter.
For n < W: min(n,W) > log n
So, when you can guarantee that n < W ---which is almost always the case for IntMap---, then O(min(n,W)) is linear and therefore worse than O(log n).
Indeed---though you see worst-case behavior only for carefully-chosen key sets (eg successive powers of 2). If the n values in an IntMap are, say, consecutive or nearly-consecutive, we obtain a log n bound. I suspect in practice most programmers will see logarithmic behavior.
But even so, if your constant factors are k << c, then k*n < c*log n is perfectly possible for all n < W, and therefore what matters in the real world here is the constant factors. The reason why is that for asymptotic purposes O(min(n,W)) and O(log n) belong to the same class of functions between constant and linear, so they're effectively the same (in asymptotic-land).
The argument for constant-time IntMap performance is essentially similar to the following argument: There are balanced trees that provide an O(lg n) bound on tree depth for a tree containing n elements. Our computer has only k bits of address space and therefore the number of elements in any in-memory tree is O(k). Thus there is a constant (and smallish) upper bound on tree depth, O(lg k). Therefore every balanced tree implementation offers constant-time access. As you observe, it's really down to constant factors. The reason IntMap (or any digital trie) is so interesting is that it is simple enough that the constant factors are quite good---in particular we don't waste a lot of time figuring out if we're going to need to rearrange the tree structure on the fly. That turns out to amortize quite a few extra level traversals in a lot of settings. It also offers really elegant implementations of union and unions. Whether that means they're quickish I leave to the benchmarkers. -Jan-Willem Maessen
-- Live well, ~wren _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Jan-Willem Maessen wrote:
As you observe, it's really down to constant factors. The reason IntMap (or any digital trie) is so interesting is that it is simple enough that the constant factors are quite good---in particular we don't waste a lot of time figuring out if we're going to need to rearrange the tree structure on the fly. That turns out to amortize quite a few extra level traversals in a lot of settings.
This simplicity of not rebalancing also allows for more sharing in a persistent setting, which can be a big gain for programs with the right kinds of data distributions.
It also offers really elegant implementations of union and unions. Whether that means they're quickish I leave to the benchmarkers.
In my experience (NLP work), the performance of unions for tries is one of their major selling points. In Okasaki's venerable benchmarks[1], they vastly outperform all competitors for merging. Even in imperative-land, that's one of the reasons tries are so common in NLP and are often chosen over hashmaps for certain tasks. [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.5452 -- Live well, ~wren

Hello Jake, Friday, July 16, 2010, 7:26:22 AM, you wrote:
Excluding DiffArray under certain usage patterns of course, but DiffArray is slow for unknown reasons besides algorithmic complexity.
unknown reason = MVar usage ArrayRef library contains parameterized DiffArray implementation that may be specialized either to MVar or IORef usage -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (12)
-
Andrew Coppin
-
Bulat Ziganshin
-
Felipe Lessa
-
Gregory Crosswhite
-
Ivan Lazar Miljenovic
-
Jake McArthur
-
Jan-Willem Maessen
-
Sergey Mironov
-
Serguey Zefirov
-
Stephen Tetley
-
Victor Gorokhov
-
wren ng thornton