Bit shifting limitations

The current state of affairs is a bit unsatisfactory. 1. The biggest problem, as I see it, is that while we have shiftR and shiftL, which are documented as giving 0 or -1 when shifting too far, and we have unsafeShiftR and unsafeShiftL, which are likely to do whatever the CPU happens to do, we don't have anything guaranteed to shift using just the first five or six bits of the shift count, which is what Java specifies and what unsafeShiftR and unsafeShiftL *actually* do (at least on x86_64). I propose that we add these masked shifts to Data.Bits. The default implementations can look something like: shiftRMasked x count | popCount (finiteBitSize x) != 1 = error "Masked shift only makes sense if the size is a power of 2." | otherwise = x `unsafeShiftR` (count .&. (finiteBitSize x - 1)) 2. It would be nice to specify what shiftR and shiftL are supposed to do when given negative shift counts. Is there a practical reason not to specify that? 3. I would like to add explicit arithmetic and logical shifts to Data.Bits. When fiddling bits, it's often easier to think about that distinction explicitly, rather than in terms of whether the type is signed or unsigned, and more convenient to have an explicit operation rather than casting around. David

definitely.
I would much prefer to respecify shiftR and shiftL as masked in
nature, that is a much better state of affairs than calling error and
easier to arrange for arches that don't natively act that way.
Another advantage of logical and arithmetic shifts is that there would
be easier to get rid of the pesky 'Num' superclass as you wouldn't
need Word/Int trickery and depending on dubious fromIntegaral behavior
to get the one you want.
jhc actually has this behavior for the standard numeric types, it
makes a big difference in speed.
Data.Bits declaration
http://repetae.net/repos/jhc/lib/haskell-extras/Data/Bits.hs
Instances for Data
http://repetae.net/repos/jhc/lib/haskell-extras/Data/Bits.m4
The foreign primitive decls directly translate to the assembly/c--
opcodes which behave in the masking fashion you specified.
I am not entirely pleased with this as the Num superclass still
exists; users won't know to define both arithmetic and logical and
there isn't an easy way to create defaults for them while keeping
backwards compatibility. Once class aliases are fully integrated then
this will be straigtforward to solve in a backwards compatible manner.
It is still better than the alternative though.
John
On Sun, Jul 13, 2014 at 11:36 AM, David Feuer
The current state of affairs is a bit unsatisfactory.
1. The biggest problem, as I see it, is that while we have shiftR and shiftL, which are documented as giving 0 or -1 when shifting too far, and we have unsafeShiftR and unsafeShiftL, which are likely to do whatever the CPU happens to do, we don't have anything guaranteed to shift using just the first five or six bits of the shift count, which is what Java specifies and what unsafeShiftR and unsafeShiftL *actually* do (at least on x86_64). I propose that we add these masked shifts to Data.Bits. The default implementations can look something like:
shiftRMasked x count | popCount (finiteBitSize x) != 1 = error "Masked shift only makes sense if the size is a power of 2." | otherwise = x `unsafeShiftR` (count .&. (finiteBitSize x - 1))
2. It would be nice to specify what shiftR and shiftL are supposed to do when given negative shift counts. Is there a practical reason not to specify that?
3. I would like to add explicit arithmetic and logical shifts to Data.Bits. When fiddling bits, it's often easier to think about that distinction explicitly, rather than in terms of whether the type is signed or unsigned, and more convenient to have an explicit operation rather than casting around.
David
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- John Meacham - http://notanumber.net/

On Jul 13, 2014 3:30 PM, "John Meacham"
definitely.
I would much prefer to respecify shiftR and shiftL as masked in nature, that is a much better state of affairs than calling error and easier to arrange for arches that don't natively act that way.
They don't call error when given large counts; they give zero or -1. This is a pleasant property that we probably want to offer; it's just not (apparently) what CPUs tend to offer, and isn't always desirable.
Another advantage of logical and arithmetic shifts is that there would be easier to get rid of the pesky 'Num' superclass as you wouldn't need Word/Int trickery and depending on dubious fromIntegaral behavior to get the one you want.
Yes, exactly. I wouldn't want to change the current behavior, I don't think, but rather add more functions.
I am not entirely pleased with this as the Num superclass still exists; users won't know to define both arithmetic and logical and there isn't an easy way to create defaults for them while keeping backwards compatibility. Once class aliases are fully integrated then this will be straigtforward to solve in a backwards compatible manner. It is still better than the alternative though.
I have no idea what any of that means, but it should be easy to create defaults using isSigned. Something vaguely like ShiftRArithmetic x c = if (isSigned x) then shiftR x c else (-1 `shiftR` c) .|. (x `shiftR` c)

On Sun, Jul 13, 2014 at 12:47 PM, David Feuer
ShiftRArithmetic x c = if (isSigned x) then shiftR x c else (-1 `shiftR` c) .|. (x `shiftR` c)
hmm... that isn't quite it because it will always add in the ones on the left when you only want to do it when the high bit is set, plus it assumes a specific representation for -1. John -- John Meacham - http://notanumber.net/

No, you're right about that. I'm not too clear on what a right shift of a
negative number not represented with 2's complement is supposed to mean (or
what any of these shifts should mean for a non-binary representation). And
you're right that I missed a key test. One possibility is just to test the
high order bit directly and test with that, and use the complement of 0
instead of -1.
On Jul 13, 2014 4:18 PM, "John Meacham"
On Sun, Jul 13, 2014 at 12:47 PM, David Feuer
wrote: ShiftRArithmetic x c = if (isSigned x) then shiftR x c else (-1 `shiftR` c) .|. (x `shiftR` c)
hmm... that isn't quite it because it will always add in the ones on the left when you only want to do it when the high bit is set, plus it assumes a specific representation for -1.
John
-- John Meacham - http://notanumber.net/

On Sun, Jul 13, 2014 at 1:33 PM, David Feuer
No, you're right about that. I'm not too clear on what a right shift of a negative number not represented with 2's complement is supposed to mean (or what any of these shifts should mean for a non-binary representation). And you're right that I missed a key test. One possibility is just to test the high order bit directly and test with that, and use the complement of 0 instead of -1.
Yeah, complement of zero would be better as removing dependencies on the 'Num' superclass is ideal. but 'high order bit' is tough to define, for instance 'Integer' is a member of bits, which is okay because it is specified to be treated as if it is sign extended to infinity for the purpose of bit ops, But any default refering to 'highest bit' wouldn't work. of course, bitSize already returns an error for Integer so perhaps this isn't making anything worse. (which also bugs me, bitSize should have returned a Maybe Int) John -- John Meacham - http://notanumber.net/

Well, bitSize is deprecated in favor of two safe alternatives, including finiteBitSize. It is unfortunate that the signed stuff in my default creates a Num dependency—that may be enough to kill it. The case of Integer is pretty peculiar. What do people use that instance for?

On Sun, Jul 13, 2014 at 2:23 PM, David Feuer
Well, bitSize is deprecated in favor of two safe alternatives, including finiteBitSize. It is unfortunate that the signed stuff in my default creates a Num dependency--that may be enough to kill it. The case of Integer is pretty peculiar. What do people use that instance for?
Infinite bit sets I guess, I don't think it is that unreasonable to exist, were it not for that pesky bitSize. FiniteBits and that deprecation are GHC specific. Though, it would make sense to port to jhc, it's fairly annoying for portable code to rely on ad-hoc changes like this. Looks like some work went into removing the Num superclass in ghc's base. Hmm... I think type class aliases are needed to actually make it backwards compatible though. Since bit is a primitive, you can get zero from the somewhat awkward 'clearBit 0 (bit 0)' and one from 'bit 1' and -1 from complement zero so the defaults that were dropped can be added back in using those. John -- John Meacham - http://notanumber.net/

On Jul 13, 2014 5:35 PM, "John Meacham"
Infinite bit sets I guess, I don't think it is that unreasonable to exist, were it not for that pesky bitSize.
No, that use *is* unreasonable. Infinite bitsets are just optimized unboxed expandable Boolean vectors, and it makes more sense to have one type that fills with False and another that fills with True than a notion that a bitset is "signed". IntN and WordN are special for two reasons: 1. Their sizes are especially fast. 2. They are numbers, and those bitwise operations can do some pretty cool things fast—the specific thing that I was looking at just now was a Java implementation of isSquare that maaartinus wrote on StackOverflow that uses a masked shift to index into a (logical) bitvector by the six low-order bits of a (logical) integer to see if those bits can be the low-order bits of a perfect square. When I went to do that in Haskell, I ran into all sorts of unpleasant limitations of Data.Bits and some very odd types in Data.Bits.Extras.
FiniteBits and that deprecation are GHC specific. Though, it would make sense to port to jhc, it's fairly annoying for portable code to rely on ad-hoc changes like this.
Looks like some work went into removing the Num superclass in ghc's base. Hmm... I think type class aliases are needed to actually make it backwards compatible though. Since bit is a primitive, you can get zero from the somewhat awkward 'clearBit 0 (bit 0)' and one from 'bit 1' and -1 from complement zero so the defaults that were dropped can be added back in using those.
John -- John Meacham - http://notanumber.net/

Am 13.07.2014 23:50, schrieb David Feuer:
the specific thing that I was looking at just now was a Java implementation of isSquare that maaartinus wrote on StackOverflow that uses a masked shift to index into a (logical) bitvector by the six low-order bits of a (logical) integer to see if those bits can be the low-order bits of a perfect square.
Taking the six least-significant bits means modulo 64. There are 12 remainders of squares modulo 64: [0,1,4,9,16,17,25,33,36,41,49,57] I have run some tests in order to see whether there are divisors with especially small set of square remainders. A test based on this would require a full division but you can considerably increase the ratio of non-square remainders to square-remainders. Prelude> let sqrRems n = Data.Set.fromList $ map (\x -> mod (x*x) n) $ take n [0..] Prelude> Data.List.Key.maximum (uncurry (Data.Ratio.%)) $ map (\n -> (n, Data.Set.size $ sqrRems n)) [1..64] (48,8) Prelude> Data.List.Key.maximum (uncurry (Data.Ratio.%)) $ map (\n -> (n, Data.Set.size $ sqrRems n)) [1..1024] (1008,64) Prelude> Data.List.Key.maximum (uncurry (Data.Ratio.%)) $ map (\n -> (n, Data.Set.size $ sqrRems n)) [1..8192] (7920,288) That is, if you compute modulo 64, every 5th number will have a square remainder. In contrast to that, with modulo 7920 only every 27th number will have a remainder of a square. (Assuming the remainder of the tested numbers is equally distributed.)

I figured out a way to default arithmetic vs. logical shifts without
introducing a Num dependency: the value of (complement 0) `shiftR` 1 will
reveal the behavior of shiftR, and therefore what shiftRA and shiftRL need
to do.
On Jul 13, 2014 5:35 PM, "John Meacham"
On Sun, Jul 13, 2014 at 2:23 PM, David Feuer
wrote: Well, bitSize is deprecated in favor of two safe alternatives, including finiteBitSize. It is unfortunate that the signed stuff in my default creates a Num dependency--that may be enough to kill it. The case of Integer is pretty peculiar. What do people use that instance for?
Infinite bit sets I guess, I don't think it is that unreasonable to exist, were it not for that pesky bitSize.
FiniteBits and that deprecation are GHC specific. Though, it would make sense to port to jhc, it's fairly annoying for portable code to rely on ad-hoc changes like this.
Looks like some work went into removing the Num superclass in ghc's base. Hmm... I think type class aliases are needed to actually make it backwards compatible though. Since bit is a primitive, you can get zero from the somewhat awkward 'clearBit 0 (bit 0)' and one from 'bit 1' and -1 from complement zero so the defaults that were dropped can be added back in using those.
John -- John Meacham - http://notanumber.net/

On Sun, Jul 13, 2014 at 12:47 PM, David Feuer
They don't call error when given large counts; they give zero or -1. This is a pleasant property that we probably want to offer; it's just not (apparently) what CPUs tend to offer, and isn't always desirable.
Ah, excellent. I was thinking of an older implementation that errored out. John -- John Meacham - http://notanumber.net/

Hello John, On 2014-07-13 at 21:29:44 +0200, John Meacham wrote: [...]
Another advantage of logical and arithmetic shifts is that there would be easier to get rid of the pesky 'Num' superclass as you wouldn't need Word/Int trickery and depending on dubious fromIntegaral behavior to get the one you want.
Btw, are you aware http://hackage.haskell.org/package/base-4.7.0.0/docs/Data-Bits.html doesn't have Num as its superclass anymore? Or are you referring to some other issue? Cheers, hvr

Am 13.07.2014 20:36, schrieb David Feuer:
3. I would like to add explicit arithmetic and logical shifts to Data.Bits. When fiddling bits, it's often easier to think about that distinction explicitly, rather than in terms of whether the type is signed or unsigned, and more convenient to have an explicit operation rather than casting around.
What kind of data do you have in mind, where both signed and unsigned shifts make sense? So far, I'd say, that calling "asr" on Word and "lsr" on Int should be a type error.

That may be reasonable. I think it likely makes more sense than making the type of shift depend on the type of number, anyway. On Jul 13, 2014 3:43 PM, "Henning Thielemann" < schlepptop@henning-thielemann.de> wrote:
Am 13.07.2014 20:36, schrieb David Feuer:
3. I would like to add explicit arithmetic and logical shifts to
Data.Bits. When fiddling bits, it's often easier to think about that distinction explicitly, rather than in terms of whether the type is signed or unsigned, and more convenient to have an explicit operation rather than casting around.
What kind of data do you have in mind, where both signed and unsigned shifts make sense?
So far, I'd say, that calling "asr" on Word and "lsr" on Int should be a type error.

FWIW- this is actually the behavior for shiftL. It should just 'do the right thing' for over-shifting, in that it just keeps shifting, resulting in For shiftR we work that way, but there is apparently a cut and paste error in the haddocks that makes shiftR have the warning from unsafeShiftR that its behavior on overflow is undefined.
unsafeShiftR 1 1000000 :: Int
1 This stays put because 1000000 `mod` 64 = 0, while
shiftR 1 1000000 :: Int
0 saturates as you'd expect from:
unsafeShiftL 1 1000000 :: Int
1
shiftL 1 1000000 :: Int
0
The fact that we work modulo the integer size for unsafeShiftL/unsafeShiftR
is very much platform specific though and its left deliberately
unspecified, as the current unsafeShiftL and unsafeShiftR are intended for
use in scenarios where the cost of the branch would be prohibitive, so
over-constraining their implementation would be quite bad.
The intention of the current specification should be that shiftR and shiftL
just shift and keep on shifting even if you give a number that is too big,
while unsafeShiftR and unsafeShiftL do undefined platform specific stuff
when you give them a number out of range, usually this will be the modular
shift that you get from the native instruction.
This way you can use unsafeShiftL and unsafeShiftR when you know the
argument is in range.
We should definitely fix the documentation for shiftR though. That it
claims to be undefined on over-shifting is just wrong.
-Edward
On Sun, Jul 13, 2014 at 3:50 PM, David Feuer
That may be reasonable. I think it likely makes more sense than making the type of shift depend on the type of number, anyway. On Jul 13, 2014 3:43 PM, "Henning Thielemann" < schlepptop@henning-thielemann.de> wrote:
Am 13.07.2014 20:36, schrieb David Feuer:
3. I would like to add explicit arithmetic and logical shifts to
Data.Bits. When fiddling bits, it's often easier to think about that distinction explicitly, rather than in terms of whether the type is signed or unsigned, and more convenient to have an explicit operation rather than casting around.
What kind of data do you have in mind, where both signed and unsigned shifts make sense?
So far, I'd say, that calling "asr" on Word and "lsr" on Int should be a type error.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Sun, Jul 13, 2014 at 12:42 PM, Henning Thielemann
What kind of data do you have in mind, where both signed and unsigned shifts make sense?
It almost always matters and is independent of the type. For instance, pretty much every one of these requires one or the other and it doesn't necessarily coorespond to the signedness of the arguments relying on C casts. https://graphics.stanford.edu/~seander/bithacks.html Generally, except for the very basic case of replacing a multiply with a shift, you will want one behavior or the other. Plus, it is exactly the distinction made in CPUs, they have independent arithmetic and logical shift operations. John -- John Meacham - http://notanumber.net/

On Sun, Jul 13, 2014 at 12:42 PM, Henning Thielemann
So far, I'd say, that calling "asr" on Word and "lsr" on Int should be a type error.
No, that would completely ruin the point of having them, the whole idea is being able to perform specifically an arithmetic or logical shift independent of the signedness of the underlying type. Signedness is a convention of how bit patterns are used, the underlying bit primitives actually implemented are independent of the language level conventions. It would make no more sense to call a type error on them than calling a type error on .&. and .|. on Word or Int. arithmetic shift may have the useful property that negative numbers are preserved when the bit patterns are interpreted as 2s complement numbers, but that is just a useful property, not a condition of use. Shifts are not numeric operations, they are bit operations like .&. and .|., they just happen to be able to be used to implement numeric operations in specific circumstances. John -- John Meacham - http://notanumber.net/

Am 13.07.2014 22:42, schrieb John Meacham:
On Sun, Jul 13, 2014 at 12:42 PM, Henning Thielemann
wrote: So far, I'd say, that calling "asr" on Word and "lsr" on Int should be a type error.
No, that would completely ruin the point of having them, the whole idea is being able to perform specifically an arithmetic or logical shift independent of the signedness of the underlying type.
I'd prefer to distinguish between IntN, CardN (aka WordN) and BitSetN. IntN are signed integers, CardN are unsigned integers and BitSetN is a bit pattern without a numeric interpretation (flags, graphical patterns ...). IntN and CardN would be in a sub-class of Num that supports multiplication, division and modulo with powers of two, which can be efficiently implemented by shifts and bitwise Ands. BitSetN would support all kinds of logical and arithmetical shifts, shifts-through-carry, rotations, bit counts, bitwise logic and what know I. Bit manipulation of numbers as provided by the Bits class was certainly not a good idea and will not become better by extending in that direction.

Without a type generic way to translate between 'IntN', 'CardN', and
'BitSetN' then that will be quite awkward to work with.
Being able to perform logical shifts on unsigned numbers and
arithmetic shifts on signed numbers is just extremely useful. The
compiler is smart enough to transform div and mod power of two
operations into bit operations so in general when you are explicitly
using shiftR it is for their bit manipulation properties, not their
correspondence to mod/div/mul. IntN/WordN are already specified to be
2s complement representations so these operations are well defined.
In C you have to do awkward things like 'sign = -(int)((unsigned
int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));' to force the correct
type of shift you want for a given arch. (right shift behavior is arch
specific in C).
John
On Sun, Jul 13, 2014 at 1:55 PM, Henning Thielemann
Am 13.07.2014 22:42, schrieb John Meacham:
On Sun, Jul 13, 2014 at 12:42 PM, Henning Thielemann
wrote: So far, I'd say, that calling "asr" on Word and "lsr" on Int should be a type error.
No, that would completely ruin the point of having them, the whole idea is being able to perform specifically an arithmetic or logical shift independent of the signedness of the underlying type.
I'd prefer to distinguish between IntN, CardN (aka WordN) and BitSetN. IntN are signed integers, CardN are unsigned integers and BitSetN is a bit pattern without a numeric interpretation (flags, graphical patterns ...). IntN and CardN would be in a sub-class of Num that supports multiplication, division and modulo with powers of two, which can be efficiently implemented by shifts and bitwise Ands. BitSetN would support all kinds of logical and arithmetical shifts, shifts-through-carry, rotations, bit counts, bitwise logic and what know I.
Bit manipulation of numbers as provided by the Bits class was certainly not a good idea and will not become better by extending in that direction.
-- John Meacham - http://notanumber.net/

I think defining the behavior of shifts to work modulo the bitSize is no
better than leaving those ranges undefined. If we were going to define what
it means to shift outside that range I'd rather it was somehow consistent
with math multiplying and dividing by powers of two than randomly rolling
over to a world where shifting a Word32 by 33 somehow corresponds to
shifting by 1.
On Sun, Jul 13, 2014 at 11:36 AM, David Feuer
The current state of affairs is a bit unsatisfactory.
1. The biggest problem, as I see it, is that while we have shiftR and shiftL, which are documented as giving 0 or -1 when shifting too far, and we have unsafeShiftR and unsafeShiftL, which are likely to do whatever the CPU happens to do, we don't have anything guaranteed to shift using just the first five or six bits of the shift count, which is what Java specifies and what unsafeShiftR and unsafeShiftL *actually* do (at least on x86_64). I propose that we add these masked shifts to Data.Bits. The default implementations can look something like:
shiftRMasked x count | popCount (finiteBitSize x) != 1 = error "Masked shift only makes sense if the size is a power of 2." | otherwise = x `unsafeShiftR` (count .&. (finiteBitSize x - 1))
2. It would be nice to specify what shiftR and shiftL are supposed to do when given negative shift counts. Is there a practical reason not to specify that?
3. I would like to add explicit arithmetic and logical shifts to Data.Bits. When fiddling bits, it's often easier to think about that distinction explicitly, rather than in terms of whether the type is signed or unsigned, and more convenient to have an explicit operation rather than casting around.
David
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Eric Mertens

Am 13.07.2014 21:49, schrieb Eric Mertens:
I think defining the behavior of shifts to work modulo the bitSize is no better than leaving those ranges undefined. If we were going to define what it means to shift outside that range I'd rather it was somehow consistent with math multiplying and dividing by powers of two than randomly rolling over to a world where shifting a Word32 by 33 somehow corresponds to shifting by 1.
What he actually wants, is (x `testBit` mod n 64). I guess "rotate" is the better than "shift" here because "rotate" is naturally modular.
participants (6)
-
David Feuer
-
Edward Kmett
-
Eric Mertens
-
Henning Thielemann
-
Herbert Valerio Riedel
-
John Meacham