Skip to content

GH-39603: [R] Error: Cannot convert Dictionary Array of type dictionary<values=large_string, indices=uint32, ordered=0> to R#49710

Open
thisisnic wants to merge 1 commit into
apache:mainfrom
thisisnic:gh-39603_dictionary-array
Open

GH-39603: [R] Error: Cannot convert Dictionary Array of type dictionary<values=large_string, indices=uint32, ordered=0> to R#49710
thisisnic wants to merge 1 commit into
apache:mainfrom
thisisnic:gh-39603_dictionary-array

Conversation

@thisisnic
Copy link
Copy Markdown
Member

@thisisnic thisisnic commented Apr 11, 2026

Rationale for this change

Dictionary Arrays with specific type indices caused error

What changes are included in this PR?

Allows those type indices and add check to ensure no overflow

Are these changes tested?

Yes

Are there any user-facing changes?

No

AI Usage

I used AI to create the changes but manually reviewed myself afterwards

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #39603 has been automatically assigned in GitHub to PR creator.

@thisisnic thisisnic marked this pull request as ready for review April 30, 2026 14:29
@thisisnic thisisnic requested a review from jonkeane as a code owner April 30, 2026 14:29
@thisisnic thisisnic marked this pull request as draft April 30, 2026 14:34
@thisisnic thisisnic marked this pull request as ready for review May 28, 2026 09:53
Copilot AI review requested due to automatic review settings May 28, 2026 09:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes #39603 by extending the non-ALTREP Converter_Dictionary path in the R bindings to support UINT32, INT64, and UINT64 dictionary index types, in addition to adding a safety check for dictionaries larger than R's 32-bit factor code range.

Changes:

  • Allow UINT32/INT64/UINT64 index types in Converter_Dictionary's constructor and Ingest_some_nulls dispatch.
  • Add a runtime check that errors when the dictionary length exceeds std::numeric_limits<int>::max() (R factor code limit).
  • Add a new test that round-trips a factor through Table$create() with uint32/int64/uint64 dictionary index types.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
r/src/array_to_vector.cpp Adds UINT32/INT64/UINT64 cases to dictionary index handling and adds a dictionary-length overflow guard.
r/tests/testthat/test-Table.R Adds a test round-tripping a factor with wider dictionary index types back to R.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +377 to +387
tab_uint32 <- Table$create(fact, schema = schema(fct = dictionary(uint32(), utf8())))
expect_equal(tab_uint32$schema, schema(fct = dictionary(uint32(), utf8())))
expect_equal_data_frame(tab_uint32, fact)

tab_int64 <- Table$create(fact, schema = schema(fct = dictionary(int64(), utf8())))
expect_equal(tab_int64$schema, schema(fct = dictionary(int64(), utf8())))
expect_equal_data_frame(tab_int64, fact)

tab_uint64 <- Table$create(fact, schema = schema(fct = dictionary(uint64(), utf8())))
expect_equal(tab_uint64$schema, schema(fct = dictionary(uint64(), utf8())))
expect_equal_data_frame(tab_uint64, fact)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants