Skip to content

MSSQL Overflow Error on Floats in JSON #5788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
atellier opened this issue Dec 19, 2020 · 18 comments
Closed

MSSQL Overflow Error on Floats in JSON #5788

atellier opened this issue Dec 19, 2020 · 18 comments
Labels
bug Something isn't working datatypes things to do with database types, like VARCHAR and others mysql SQL Server Microsoft SQL Server, e.g. mssql use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated
Milestone

Comments

@atellier
Copy link

Describe the bug
When a Column is declared as JSON and a field contains a numeric value with more precision that Decimal(10, 6), Microsoft SQL Sever fails with an Overflow Error

Expected behavior

  1. Create the corresponding database on MSSQL, and populate with example:
CREATE TABLE dbo.test (
    id int NOT NULL,
    value varchar(max) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL,
    CONSTRAINT PK_test_id PRIMARY KEY (id)
);
GO

INSERT INTO dbo.test VALUES (1, '{"test": {"value": 1234567.89}}');
GO
  1. Assuming we have Session established, run the following code:
from sqlalchemy import Column, JSON, select, Integer
from sqlalchemy.orm import declarative_base

Base = declarative_base()

class Test(Base):
    __tablename__ = "test"
    __table_args__ = {"schema": "dbo"}

    id = Column(Integer, primary_key=True)
    value = Column(JSON, nullable=False)

session = ...
session.execute(select(Test.value[("test", "value")].as_float())).scalars().all()

Error

pyodbc.Error: ('HY019', '[HY019] [FreeTDS][SQL Server]Arithmetic overflow error converting nvarchar to data type numeric. (8115) (SQLExecDirectW)')

Versions.

  • OS: OSX
  • Python: 3.8
  • SQLAlchemy: 1.4.01b
  • Database: MSSQL 2017
  • DBAPI: pyodbc

Additional context
The issue is caused by the cast on that line: https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2118/19/lib/sqlalchemy/dialects/mssql/base.py#2045

Using FLOAT instead of DECIMAL(10, 6) fixes the issue at the expense of some conversion and potential loss of precision - but it does not cause an error.

Note: Having a as_numeric(p,s) in the JSON implementation would be nice. So that we could generate a cast as FLOAT or as DECIMAL(p, s) depending on what as_xxxx as used.

Thanks!

@atellier atellier added the requires triage New issue that requires categorization label Dec 19, 2020
@gordthompson
Copy link
Member

Caused by copying from the mysql dialect here:

"ELSE CAST(JSON_EXTRACT(%s, %s) AS DECIMAL(10, 6))"

@zzzeek zzzeek added bug Something isn't working SQL Server Microsoft SQL Server, e.g. mssql and removed requires triage New issue that requires categorization labels Dec 19, 2020
@zzzeek zzzeek added this to the 1.4 milestone Dec 19, 2020
@zzzeek zzzeek added the datatypes things to do with database types, like VARCHAR and others label Dec 19, 2020
@zzzeek
Copy link
Member

zzzeek commented Dec 19, 2020

so the accessor is currently called as_float() so it seems like it certainly should be casting to FLOAT for that, and luckily we are in betas for SQL Server's feature so we can just change it. @gordthompson was the DECIMAL(10, 6) to get tests to pass or can we just change that?

also as_numeric(p, s) is nice as well, I wonder why that wasn't added, can do this for 1.4 too.

@zzzeek
Copy link
Member

zzzeek commented Dec 19, 2020

ah mysql. does mysql have a bug w/ the above?

@gordthompson
Copy link
Member

ah mysql. does mysql have a bug w/ the above?

@zzzeek - I'll have a look.

@gordthompson
Copy link
Member

Repro code does not fail for mysql+mysqldb://. SQL generated is

2020-12-19 15:29:29,892 INFO sqlalchemy.engine.Engine SELECT CASE JSON_EXTRACT(test.value, %s) WHEN 'null' THEN NULL ELSE CAST(JSON_EXTRACT(test.value, %s) AS DECIMAL(10, 6)) END AS anon_1 
FROM test
2020-12-19 15:29:29,893 INFO sqlalchemy.engine.Engine [generated in 0.00166s] ('$."test"."value"', '$."test"."value"')

@gordthompson
Copy link
Member

… however, if I look at the actual value returned I see that it just failed silently:

result = (
    session.execute(select(Test.value[("test", "value")].as_float()))
    .scalars()
    .all()
)
print(result)  # [9999.999999]

@gordthompson
Copy link
Member

gordthompson commented Dec 20, 2020

Here's a patch that fixes the as_float() issue and implements as_numeric(). Am I on the right track here?

Index: lib/sqlalchemy/sql/sqltypes.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/lib/sqlalchemy/sql/sqltypes.py b/lib/sqlalchemy/sql/sqltypes.py
--- a/lib/sqlalchemy/sql/sqltypes.py	(revision a8f51f3c11f3cb2e344732cf3abb371f03ed30d8)
+++ b/lib/sqlalchemy/sql/sqltypes.py	(date 1608480205905)
@@ -2467,9 +2467,24 @@
             .. versionadded:: 1.3.11
 
             """
-            # note there's no Numeric or Decimal support here yet
             return self._binary_w_type(Float(), "as_float")
 
+        def as_numeric(self, precision, scale):
+            """Cast an indexed value as numeric/decimal.
+
+            e.g.::
+
+                stmt = select(
+                    mytable.c.json_column['some_data'].as_numeric(4, 2)
+                ).where(
+                    mytable.c.json_column['some_data'].as_numeric(4, 2) == Decimal('29.75')
+                )
+
+            .. versionadded:: 1.4
+
+            """
+            return self._binary_w_type(Numeric(precision, scale), "as_numeric")
+
         def as_json(self):
             """Cast an indexed value as JSON.
 
Index: lib/sqlalchemy/dialects/mysql/base.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/lib/sqlalchemy/dialects/mysql/base.py b/lib/sqlalchemy/dialects/mysql/base.py
--- a/lib/sqlalchemy/dialects/mysql/base.py	(revision a8f51f3c11f3cb2e344732cf3abb371f03ed30d8)
+++ b/lib/sqlalchemy/dialects/mysql/base.py	(date 1608479888351)
@@ -1456,12 +1456,13 @@
             )
         elif binary.type._type_affinity is sqltypes.Numeric:
             # FLOAT / REAL not added in MySQL til 8.0.17
-            type_expression = (
-                "ELSE CAST(JSON_EXTRACT(%s, %s) AS DECIMAL(10, 6))"
-                % (
-                    self.process(binary.left, **kw),
-                    self.process(binary.right, **kw),
-                )
+            type_expression = "ELSE CAST(JSON_EXTRACT(%s, %s) AS %s)" % (
+                self.process(binary.left, **kw),
+                self.process(binary.right, **kw),
+                "FLOAT"
+                if isinstance(binary.type, sqltypes.Float)
+                else "DECIMAL(%s, %s)"
+                % (binary.type.precision, binary.type.scale),
             )
         elif binary.type._type_affinity is sqltypes.Boolean:
             # the NULL handling is particularly weird with boolean, so

@gordthompson
Copy link
Member

What seems odd to me is that

return self._binary_w_type(Float(), "as_float")

produces an object whose .type is sqltypes.Float but whose .type._type_affinity is sqltypes.Numeric.

@zzzeek
Copy link
Member

zzzeek commented Dec 20, 2020

What seems to work best on most MySQL / MariaDB versions is this:

diff --git a/lib/sqlalchemy/dialects/mysql/base.py b/lib/sqlalchemy/dialects/mysql/base.py
index 911c0d522..95a2040f0 100644
--- a/lib/sqlalchemy/dialects/mysql/base.py
+++ b/lib/sqlalchemy/dialects/mysql/base.py
@@ -1457,7 +1457,7 @@ class MySQLCompiler(compiler.SQLCompiler):
         elif binary.type._type_affinity is sqltypes.Numeric:
             # FLOAT / REAL not added in MySQL til 8.0.17
             type_expression = (
-                "ELSE CAST(JSON_EXTRACT(%s, %s) AS DECIMAL(10, 6))"
+                "ELSE JSON_EXTRACT(%s, %s)+0.0"
                 % (
                     self.process(binary.left, **kw),
                     self.process(binary.right, **kw),

however it does lose precision on MySQL 5.7 for the given test value "1234567.89".

it seems obvious that to support precision decimals in json we need as_numeric(s, p).

@zzzeek zzzeek added the use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated label Dec 20, 2020
@gordthompson gordthompson self-assigned this Dec 20, 2020
@sqla-tester
Copy link
Collaborator

Mike Bayer referenced this issue:

poc for as numeric https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2427

@zzzeek
Copy link
Member

zzzeek commented Dec 20, 2020

hey gord -

the above gerrit is a short POC for as_numeric().

@sqla-tester
Copy link
Collaborator

Gord Thompson has proposed a fix for this issue in the master branch:

Fix issues with JSON and float/numeric https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2428

@gordthompson
Copy link
Member

@zzzeek - I didn't see your change before submitting mine. I'll reconcile the two shortly.

@atellier
Copy link
Author

atellier commented Dec 20, 2020

Thanks for the awesomely quick turn-around!

BTW there appears to be a slight difference between DECIMAL and NUMERIC, at least in MSSQL: https://sqlrob.com/2018/01/22/decimal-vs-numeric/ and https://learnsql.com/blog/understanding-numerical-data-types-sql/

There is a small difference between NUMERIC(p,s) and DECIMAL(p,s) SQL numeric data type. NUMERIC determines the exact precision and scale. DECIMAL specifies only the exact scale; the precision is equal or greater than what is specified by the coder. DECIMAL columns can have a larger-than-specified precision if this is more convenient or efficient for the database system.

In other words, DECIMAL gives you some leeway.

Just mentioning that as the fix uses DECIMAL and the function is as_numeric() and that may not be the exact intended cast.

Thanks!

@gordthompson
Copy link
Member

@atellier - Thanks for the clarification.

@zzzeek - Should we keep casting to DECIMAL and call the method as_decimal()? Or keep the method as as_numeric() and cast to NUMERIC? Or perhaps do both?

@gordthompson
Copy link
Member

… personally, I always use DECIMAL.

@zzzeek
Copy link
Member

zzzeek commented Dec 20, 2020

I think here we are looking for precision + scale because we are looking to get a numeric value out of structured results for which we would assume these things are known. numeric/decimal is not as much of a database agnostic-decision we can make right now. folks can always use cast() manually when they want certain kinds of data on certain kinds of platforms .

@sqla-tester
Copy link
Collaborator

Gord Thompson has proposed a fix for this issue in the master branch:

Fix issues with JSON and float/numeric https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2428

@gordthompson gordthompson removed their assignment Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datatypes things to do with database types, like VARCHAR and others mysql SQL Server Microsoft SQL Server, e.g. mssql use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated
Projects
None yet
Development

No branches or pull requests

4 participants