[CALCITE-6451] Improve Nullability Derivation for Intersect and Minus#4897
[CALCITE-6451] Improve Nullability Derivation for Intersect and Minus#4897xiedeyantu wants to merge 13 commits into
Conversation
Co-authored-by: Victor Barua <victor.barua@datadoghq.com>
|
Related PR #3845. |
|
you have some checker failures |
|
Does this work around the problems in the other PR? |
Are you referring to #3845? I noticed that you had approved this PR before, but there were some conflicts. Since it's been a long time, the CI status is no longer visible, and it's unclear if there were other issues back then. I think it's a good PR, so I’m trying to finish it up. |
|
Yes, the discussion in JIRA was about causing problems with other rules. |
I didn’t see any discussion in the Jira. Are you referring to the discussion in the original PR? I have resolved the rule conflicts. |
|
yes, the original PR |
|
According to this disscusion #3845 (comment) . |
|
@mihaibudiu I'm not sure if you agree with the current simplified processing logic. If you have time, please review this PR to see if there are any other concerns. |
|
@vbarua what do you think of this approach? |
| !ok | ||
|
|
||
| EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)], A=[$t2]) | ||
| EnumerableAggregate(group=[{0}]) |
There was a problem hiding this comment.
Do you know why this plan changed?
There was a problem hiding this comment.
I haven't been able to pinpoint how the old plan included AGG in the join, but the new plan seems fine. I might need to investigate this difference in more detail later when I have more time.
| } | ||
|
|
||
| if (!nullFilters.isEmpty()) { | ||
| relBuilder.filter(nullFilters); |
There was a problem hiding this comment.
There is a relBuilder.convert(intersect.getRowType(), false) call at the end, is this logic necessary?
Calcite doesn't have the ability to refine type inference from predicates, this rewrite is semantically equivalent, but it is unrelated to the issue-6451.
| relBuilder.project(Util.skipLast(relBuilder.fields())); | ||
|
|
||
| // ensure the nullabilities of columns in the new relation match those of the input relation | ||
| relBuilder.convert(intersect.getRowType(), false); |
There was a problem hiding this comment.
I'm not sure that whether onMatchAggregateOnUnion need this.
There was a problem hiding this comment.
I agree, it would be safer that way.
|
|
What is the status of this PR? |
There may only be one problem left now, which is that I haven't found the root cause of the changes in the plan mentioned by silun above. |
|
Current PR Change Management Process:
The final selected plan is: The original plan took this specific form because, within Volcano, every rule generates a corresponding candidate RelNode. A particular scenario arises with nested Minus operations: for each level of nesting, an equivalent Aggregation-plus-Join structure is generated. Subsequently, during the final cost-based selection phase, this specific structure was deemed to have the lowest cost and was therefore selected. Please refer to the plan diagram below. This is my current understanding; please take another look to see if it is clearly explained. @silundong @mihaibudiu |
|
If the plan is semantically equivalent, it's not a problem from my pov. |
|
Sorry for the delay. If what I mentioned has been resolved, I have no further questions. |
| assertThat(root, hasTree(expected)); | ||
| } | ||
|
|
||
| @ParameterizedTest |
There was a problem hiding this comment.
how about a test with a ROW field type?
the behavior of nulls with ROW types in Calcite is very strange.
There was a problem hiding this comment.
To address this, I added test cases involving the ROW type for the three SetOp operators. This is necessary because the ROW type imposes specific constraints regarding nullability: when the ROW itself is NOT NULL, its internal fields can be either nullable or NOT NULL; conversely, when the ROW is NULL, all its internal fields must also be NULL. For the INTERSECT operator, I used a NOT NULL outer ROW to observe the tightening of the internal fields' nullability attributes. For the UNION operator, I used a NULL outer ROW to observe the relaxation of nullability attributes for both the internal fields and the outer ROW itself.
There was a problem hiding this comment.
Could you please check if any other cases need to be added or if anything needs to be modified?
|



Jira Link
CALCITE-6451
Changes Proposed
SetOp overrides
deriveRowType()and computes the output row type to be the least restrictive across all inputs here.So for example given
Input 1: (I64, I64, I64?, I64?)
Input 2: (I64, I64?, I64, I64?)
where ? denotes nullable, the least restrictive output computes:
Output: (I64, I64?, I64?, I64?)
For UNION operations, these nullabilities are accurate.
However for MINUS and INTERSECT there is room for improvement.
MINUS only returns rows from the first input, as such its output nullability should always match that of its first input:
Output: (I64, I64, I64?, I64?)
INTERSECT only returns rows that match across all inputs. If a column is not nullable in any of the inputs, then it is not nullable in the output because no rows can be emitted in which that column is null:
Output: (I64, I64, I64, I64?)