FIX: Request SQL_CHAR as SQL_C_WCHAR in arrow fetch path by ffelixg · Pull Request #575 · microsoft/mssql-python

ffelixg · 2026-05-13T11:08:03Z

Work Item / Issue Reference

AB#44922

GitHub Issue: #553

Summary

Due to #495, we can now request SQL_CHAR data as SQL_C_WCHAR, i.e. utf16le strings. Doing this for the arrow path ensures that arrow methods always return correct data no matter the encoding settings / locale / operating system. There does not seem to be any significant negative performance impact.

Copilot

Pull request overview

Updates the Arrow fetch path in the C++ pybind layer to always request SQL_CHAR/SQL_VARCHAR data as SQL_C_WCHAR (UTF-16) so Arrow results are correct regardless of server/client codepage, locale, or platform—addressing the VARCHAR non-ASCII decoding issues reported in #553.

Changes:

Switch Arrow batch binding/fetching for SQL_CHAR/SQL_VARCHAR from SQL_C_CHAR to SQL_C_WCHAR to avoid codepage-dependent decoding.
Remove the narrow-char copy path for SQL_CHAR/SQL_VARCHAR in Arrow batch production and route through the existing wide-char → UTF-8 conversion logic.
Add an Arrow regression test covering Unicode round-tripping through a UTF-8-collated VARCHAR column.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`mssql_python/pybind/ddbc_bindings.cpp`	Changes Arrow batch binding and fetch handling so `VARCHAR` is requested as `SQL_C_WCHAR`, ensuring consistent Unicode correctness.
`tests/test_004_cursor_arrow.py`	Adds a regression test to validate Arrow output for Unicode data stored in `VARCHAR` with UTF-8 collation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…up for it

gargsaumya · 2026-05-27T07:50:11Z

/azp run

azure-pipelines · 2026-05-27T07:50:23Z

Azure Pipelines successfully started running 1 pipeline(s).

gargsaumya · 2026-05-27T08:02:48Z

-                            }
-                            break;
-                        }
+                        case SQL_LONGVARCHAR:


Can we add a comment explaining these now fall through to WCHAR handling because SQL_CHAR is bound as SQL_C_WCHAR at line 4784.

gargsaumya · 2026-05-27T08:03:17Z

-                        arrowColumnProducer->varVal[idxRowArrow + 1] = start + dataLen;
-                        break;
-                    }
+                    case SQL_LONGVARCHAR:


similarly here.

gargsaumya · 2026-05-27T08:06:48Z

+        assert tbl.column(0).to_pylist() == expected
+    finally:
+        cursor.execute(f"drop table if exists {table}")
+


The fix applies to SQL_CHAR, SQL_VARCHAR, and SQL_LONGVARCHAR, but only VARCHAR is tested. Can we add a test for CHAR (fixed-length) type.

gargsaumya · 2026-05-27T08:07:23Z

+        cursor.execute(f"drop table if exists {table}")
+
+
+def test_arrow_varchar_utf8_collation_cp1252(cursor: mssql_python.Cursor):


The test uses SQL_Latin1_General_CP1_CI_AS (CP1252), NOT a UTF-8 collation. The current name is slightly misleading, would it be better to name it as test_arrow_varchar_cp1252_collation_unicode?

Arrow fetch: request SQL_CHAR as SQL_C_WCHAR

29a9cec

Copilot AI review requested due to automatic review settings May 13, 2026 11:08

Copilot started reviewing on behalf of ffelixg May 13, 2026 11:09 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread tests/test_004_cursor_arrow.py Outdated

ffelixg added 2 commits May 13, 2026 19:10

Make utf8 collation test optional; Add mandatory cp1252 test to make …

4ce73ca

…up for it

Merge remote-tracking branch 'origin/main' into arrow_char_to_wchar

c4ce528

benmatwil mentioned this pull request May 23, 2026

VARCHAR non-ascii character parsing #553

Open

subrata-ms and others added 2 commits May 26, 2026 02:24

Merge branch 'main' into arrow_char_to_wchar

c574aca

Merge branch 'main' into arrow_char_to_wchar

9a3992e

gargsaumya reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Request SQL_CHAR as SQL_C_WCHAR in arrow fetch path#575

FIX: Request SQL_CHAR as SQL_C_WCHAR in arrow fetch path#575
ffelixg wants to merge 5 commits into
microsoft:mainfrom
ffelixg:arrow_char_to_wchar

ffelixg commented May 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

gargsaumya commented May 27, 2026

Uh oh!

azure-pipelines Bot commented May 27, 2026

Uh oh!

gargsaumya May 27, 2026

Uh oh!

gargsaumya May 27, 2026

Uh oh!

gargsaumya May 27, 2026

Uh oh!

gargsaumya May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		cursor.execute(f"drop table if exists {table}")


		def test_arrow_varchar_utf8_collation_cp1252(cursor: mssql_python.Cursor):

Conversation

ffelixg commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Work Item / Issue Reference

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

gargsaumya commented May 27, 2026

Uh oh!

azure-pipelines Bot commented May 27, 2026

Uh oh!

gargsaumya May 27, 2026

Choose a reason for hiding this comment

Uh oh!

gargsaumya May 27, 2026

Choose a reason for hiding this comment

Uh oh!

gargsaumya May 27, 2026

Choose a reason for hiding this comment

Uh oh!

gargsaumya May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ffelixg commented May 13, 2026 •

edited

Loading