Marat,
There are many ways to truly parallelize (and distribute) the execution of graph algorithms, but it largely depends on what you are trying to do with that graph. For example, if you are interested in graph traversal (find the shortest path from point A to point B in a DAG), and you want to reduce the runtime complexity of that path traversal algorithm to a simple O(log n) lookup (n being the number of nodes) so that you can efficiently perform many of these lookups in parallel in a graph that doesn't necessarily fit in RAM, you could:
1. represent the graph as an adjacency matrix;
2. multiply that square matrix by itself to obtain the matrix representing the next degree of separation (this can be done in parallel using Strassen, for example);
3. materialize those matrices as b-trees. Each b-tree will represent an association at x degrees of separation between nodes. If your graph adheres to a small world, you only need 4 of these matrices to traverse any path up to 8 degrees of separation
Of course, this is for unlabeled associations. If your associations are typed, you'll need a tensor structure rather.
Another option would be to use an adjacency list and just use relational join operations (also parallelizable) to perform the same task. Keep in mind that graphs are isomorphic to adjacency matrices and adjacency lists.
The lookups themselves could also be performed in parallel.
This is just an example, you could also calculate things like centrality and connectedness in parallel with a little ingenuity.
I hope this helps, even though it's not Haskell specific,
Flavio