Skip to content

Floats are rounded to whole numbers when inserted in batches #9701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rbuffat opened this issue Apr 26, 2023 · 14 comments
Closed

Floats are rounded to whole numbers when inserted in batches #9701

rbuffat opened this issue Apr 26, 2023 · 14 comments
Labels
bug Something isn't working datatypes things to do with database types, like VARCHAR and others near-term release addition to the milestone which indicates this should be in a near-term release postgresql regression something worked and was broken by a change
Milestone

Comments

@rbuffat
Copy link

rbuffat commented Apr 26, 2023

Describe the bug

A working example of this behavior can be found in the following repository: https://github.com/rbuffat/sqlissue

In short, the following code should insert the value 8.5514716, but inserts 9:

class Update(Base):
    __tablename__ = "test"
    __table_args__ = (PrimaryKeyConstraint("id", name="update_pkey"),)
    id: Mapped[int] = mapped_column(Integer, Sequence("test_id_seq"), primary_key=True)
    value: Mapped[float] = mapped_column(Double(53))

    data1: Mapped["Data1"] = relationship(
        "Data1", uselist=False, back_populates="update"
    )


class Data1(Base):
    __tablename__ = "data1"
    __table_args__ = (
        ForeignKeyConstraint(
            ["update_id"],
            ["test.id"],
            deferrable=True,
            initially="DEFERRED",
            name="fk_data1",
        ),
        PrimaryKeyConstraint("id", name="data1_pkey"),
        UniqueConstraint(
            "update_id",
            name="data1_update_id_key",
        ),
    )
    id: Mapped[int] = mapped_column(Integer, Sequence("data1_id_seq"), primary_key=True)
    data1: Mapped[float] = mapped_column(Double(53))

    update_id: Mapped[int] = mapped_column(Integer)
    update: Mapped["Update"] = relationship("Update", back_populates="data1")
    async_session = async_sessionmaker(engine)
    for _ in range(10):
        async with async_session() as session:
            logging.info("Add batch")
            for _ in range(4):
                update = Update(value=8.5514716)
                session.add(update)
            await session.commit()
            await asyncio.sleep(5)

Full example code can be found here: https://github.com/rbuffat/sqlissue/blob/main/test/src/main.py

When the session.commit is moved into the inner loop, the correct value is saved into the database.

Postgres log:

testdb  | 2023-04-26 13:43:30.052 UTC [68] LOG:  execute <unnamed>: BEGIN
testdb  | 2023-04-26 13:43:30.053 UTC [68] LOG:  execute _pg3_0: INSERT INTO test (id, value) SELECT p0::INTEGER, p1::NUMERIC(53) FROM (VALUES (nextval('test_id_seq'), $1, 0), (nextval('test_id_seq'), $2, 1), (nextval('test_id_seq'), $3, 2), (nextval('test_id_seq'), $4, 3)) AS imp_sen(p0, p1, sen_counter) ORDER BY sen_counter RETURNING test.id, test.id AS id__1
testdb  | 2023-04-26 13:43:30.053 UTC [68] DETAIL:  parameters: $1 = '8.5514716', $2 = '8.5514716', $3 = '8.5514716', $4 = '8.5514716'
testdb  | 2023-04-26 13:43:30.053 UTC [68] LOG:  execute <unnamed>: COMMIT

SQLalchemy log:

testcontainer  | INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
testcontainer  | 2023-04-26 13:43:30,051 INFO sqlalchemy.engine.Engine BEGIN (implicit)
testcontainer  | 2023-04-26 13:43:30,052 INFO sqlalchemy.engine.Engine INSERT INTO test (id, value) SELECT p0::INTEGER, p1::NUMERIC(53) FROM (VALUES (nextval('test_id_seq'), %(value__0)s, 0), (nextval('test_id_seq'), %(value__1)s, 1), (nextval('test_id_seq'), %(value__2)s, 2), (nextval('test_id_seq'), %(value__3)s, 3)) AS imp_sen(p0, p1, sen_counter) ORDER BY sen_counter RETURNING test.id, test.id AS id__1
testcontainer  | INFO:sqlalchemy.engine.Engine:INSERT INTO test (id, value) SELECT p0::INTEGER, p1::NUMERIC(53) FROM (VALUES (nextval('test_id_seq'), %(value__0)s, 0), (nextval('test_id_seq'), %(value__1)s, 1), (nextval('test_id_seq'), %(value__2)s, 2), (nextval('test_id_seq'), %(value__3)s, 3)) AS imp_sen(p0, p1, sen_counter) ORDER BY sen_counter RETURNING test.id, test.id AS id__1
testcontainer  | 2023-04-26 13:43:30,052 INFO sqlalchemy.engine.Engine [cached since 45.09s ago (insertmanyvalues) 1/1 (ordered)] {'value__0': 8.5514716, 'value__1': 8.5514716, 'value__2': 8.5514716, 'value__3': 8.5514716}
testcontainer  | INFO:sqlalchemy.engine.Engine:[cached since 45.09s ago (insertmanyvalues) 1/1 (ordered)] {'value__0': 8.5514716, 'value__1': 8.5514716, 'value__2': 8.5514716, 'value__3': 8.5514716}
testcontainer  | INFO:sqlalchemy.engine.Engine:COMMIT
testcontainer  | 2023-04-26 13:43:30,053 INFO sqlalchemy.engine.Engine COMMIT

Result:

test=# select * from test;
 id | value 
----+-------
  1 |     9
  2 |     9
  3 |     9
  4 |     9
  5 |     9
  6 |     9
  7 |     9
  8 |     9
  9 |     9
...

Optional link from https://docs.sqlalchemy.org which documents the behavior that is expected

No response

SQLAlchemy Version in Use

2.0.10

DBAPI (i.e. the database driver)

postgresql+psycopg (psycopg==3.1.8)

Database Vendor and Major Version

postgis/postgis:13-3.3-alpine

Python Version

3.11

Operating system

Linux

To Reproduce

A working example using docker / docker-compose can be found here: https://github.com/rbuffat/sqlissue

Error

No errors shown

Additional context

No response

@rbuffat rbuffat added the requires triage New issue that requires categorization label Apr 26, 2023
@rbuffat rbuffat changed the title Floats are rounded to ints when inserted in batches Floats are rounded to whole numbers when inserted in batches Apr 26, 2023
@zzzeek zzzeek added postgresql external driver issues the issue involves a misbehavior on the part of the DBAPI itself, probably not SQLAlchemy and removed requires triage New issue that requires categorization labels Apr 26, 2023
@zzzeek
Copy link
Member

zzzeek commented Apr 26, 2023

hi -

the test case does not involve the "Data1" class in any way and this would appear to be superfluous.

This appears to be a bug in the psycopg driver, and/or special handling would be needed. the MCVE below is a complete example and works fine with asyncpg

from __future__ import annotations

import asyncio

from sqlalchemy import Double
from sqlalchemy import Integer
from sqlalchemy import PrimaryKeyConstraint
from sqlalchemy import select
from sqlalchemy import Sequence
from sqlalchemy.ext.asyncio import async_sessionmaker
from sqlalchemy.ext.asyncio import create_async_engine
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import Mapped
from sqlalchemy.orm import mapped_column


class Base(DeclarativeBase):
    pass


class Update(Base):
    __tablename__ = "test"
    __table_args__ = (PrimaryKeyConstraint("id", name="update_pkey"),)
    id: Mapped[int] = mapped_column(
        Integer, Sequence("test_id_seq"), primary_key=True
    )
    value: Mapped[float] = mapped_column(Double(53))


async def main():
    engine = create_async_engine(
        # asyncpg works fine
        #"postgresql+asyncpg://scott:tiger@localhost/test", echo=True

        # psycopg does not
        "postgresql+psycopg://scott:tiger@localhost/test", echo=True
    )
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.drop_all)
        await conn.run_sync(Base.metadata.create_all)

    async_session = async_sessionmaker(engine)
    for _ in range(10):
        async with async_session() as session:
            for _ in range(4):
                update = Update(value=8.5514716)
                session.add(update)
            await session.commit()

    async with async_session() as session:
        for val in await session.scalars(select(Update.value)):
            assert val == 8.5514716

asyncio.run(main())

@zzzeek
Copy link
Member

zzzeek commented Apr 26, 2023

I'd report this to https://github.com/psycopg/psycopg and if we need to convert to decimal here or something, they should tell us that. cc @dvarrazzo

@zzzeek zzzeek closed this as not planned Won't fix, can't repro, duplicate, stale Apr 26, 2023
@zzzeek
Copy link
Member

zzzeek commented Apr 26, 2023

this is going to be complicated because it is only with our changes to useinsertmanyvalues in #9618. cc @CaselIT

@zzzeek zzzeek reopened this Apr 26, 2023
@zzzeek
Copy link
Member

zzzeek commented Apr 26, 2023

@CaselIT this works with 2.0.9 and fails w 2.0.10 so psycopg is not interpreting our alternate form correctly

@zzzeek
Copy link
Member

zzzeek commented Apr 26, 2023

@dvarrazzo this is the query that's not being interpreted correctly

INSERT INTO test (id, value) SELECT p0::INTEGER, p1::NUMERIC(53) FROM (VALUES (nextval('test_id_seq'), %(value__0)s, 0), (nextval('test_id_seq'), %(value__1)s, 1), (nextval('test_id_seq'), %(value__2)s, 2), (nextval('test_id_seq'), %(value__3)s, 3)) AS imp_sen(p0, p1, sen_counter) ORDER BY sen_counter RETURNING test.id, test.id AS id__1

{'value__0': 8.5514716, 'value__1': 8.5514716, 'value__2': 8.5514716, 'value__3': 8.5514716}

is there some kind of SQL parsing in psycopg that is affecting this?

@zzzeek zzzeek added regression something worked and was broken by a change and removed external driver issues the issue involves a misbehavior on the part of the DBAPI itself, probably not SQLAlchemy labels Apr 26, 2023
@zzzeek zzzeek added this to the 2.0.x milestone Apr 26, 2023
@zzzeek zzzeek added the near-term release addition to the milestone which indicates this should be in a near-term release label Apr 26, 2023
@zzzeek
Copy link
Member

zzzeek commented Apr 26, 2023

we're diong casts for asyncpg. we will add these in for psycopg. @dvarrazzo sorry for the noise

@zzzeek
Copy link
Member

zzzeek commented Apr 26, 2023

only when that precision value is there btw, which is wierd

@CaselIT
Copy link
Member

CaselIT commented Apr 26, 2023

Looking at the statement I think that the issue is on psycopg side.

The values are passed as float, and the query has the correct casts. Not sure what different we should be doing here

@CaselIT
Copy link
Member

CaselIT commented Apr 26, 2023

we're diong casts for asyncpg. we will add these in for psycopg. @dvarrazzo sorry for the noise

sure, but I don't think psycopg should take a float, do int() on it and feed it to a numeric.
At worst I would expect that it did Decimal() on it, and that would not truncate the float

@CaselIT
Copy link
Member

CaselIT commented Apr 26, 2023

Nevermind, the issue is that we transform Double(53) in Numeric(53).

select 8.5514716::numeric(53)

returns 9

@CaselIT CaselIT added bug Something isn't working datatypes things to do with database types, like VARCHAR and others labels Apr 26, 2023
@sqla-tester
Copy link
Collaborator

Mike Bayer has proposed a fix for this issue in the main branch:

use casts for floats under both psycopg drivers https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/4585

@zzzeek
Copy link
Member

zzzeek commented Apr 26, 2023

oh it's even that...that explains something here..

@zzzeek
Copy link
Member

zzzeek commented Apr 26, 2023

oh neat it's just that first cast, great

@rbuffat
Copy link
Author

rbuffat commented Apr 27, 2023

🥳 Huge thank you to all who were involved in fixing this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datatypes things to do with database types, like VARCHAR and others near-term release addition to the milestone which indicates this should be in a near-term release postgresql regression something worked and was broken by a change
Projects
None yet
Development

No branches or pull requests

4 participants