
On 13-09-21 05:13 AM, Vlatko Basic wrote:
I'd like to extract A texts from row with header "Caption", and have come up with this
runX $ doc >>> (deep (hasName "tr") -- filter only TRs >>> withTraceLevel 5 traceTree -- shows correct TR `when` deep ( hasName "th" >>> -- filter THs with specified text getChildren >>> hasText (=="Caption") ) -- inner deep >>> getChildren >>> hasName "td" -- shouldn't here be only one TR? >>> getChildren ) >>> getName &&& (getChildren >>> getText) -- list has TDs from all three TRs
Operator precedences:
infixr 1 `when` infixl 9 (default)
Therefore, this expression redundantly parenthesized and systematically indented to ensure that you are on the same page with the computer is: runX $ doc >>> ( deep (hasName "tr") >>> -- begin{conditionally prints but otherwise is arr id} ( withTraceLevel 5 traceTree `when` deep ( hasName "th" >>> getChildren >>> hasText (=="Caption") ) -- inner deep ) -- end{conditionally prints but otherwise is arr id} >>> getChildren >>> hasName "td" >>> getChildren ) >>> ( getName &&& (getChildren >>> getText) ) The condition on <th>Caption</th> ends up controlling trace messages only; it is not used to limit real processing. "when" doesn't help even when used correctly: it doesn't ban data. "guards" and "containing" ban data, but you have to put them at the right place, i.e., parenthesize correctly. runX $ doc >>> ( deep ( hasName "tr" `containing` deep ( hasName "th" >>> getChildren >>> hasText (=="Caption") ) ) >>> getChildren >>> hasName "td" >>> getChildren ) >>> ( getName &&& (getChildren >>> getText) )