Skip to content

transaction / execution simplification for 2.0 #4644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zzzeek opened this issue Apr 27, 2019 · 25 comments
Closed

transaction / execution simplification for 2.0 #4644

zzzeek opened this issue Apr 27, 2019 · 25 comments
Labels
alchemy 2 goes along with the 2.0 milestone to aid in searching big job a major change that requires a lot of knowledge and work engine engines, connections, transactions, isolation levels, execution options
Milestone

Comments

@zzzeek
Copy link
Member

zzzeek commented Apr 27, 2019

in 2.0 we seek to completely simplify Core execution and transaction control and this will also impact the ORM.

The ideas area discussed at https://gist.github.com/zzzeek/dd590d0a3485b7f19cab0322c48ece36 . Here is the theme:

When SQLAlchemy was written, Python was at 2.3. There were no context managers. Now there are! Everything is different.

So with this:

The only pattern is:

with engine.begin() as conn:
   conn.execute(...)
  • Connection gains a commit() method so commit-as-you-go can happen in the block
  • Connection gains a rollback() method which auto-begins in the block.

The above two methods CANNOT BE CALLED OUTSIDE OF THE BEGIN STATE. That is, this will work very much like Session without autocommit.

  • As part of this also, all autocommit is gone. From ORM, from Core, take it right out. It no longer applies since all transaction blocks are explicit. this will remove an enormous amount of complexity and conceptual overhead for users learning,etc.

  • we will need a vast number of documentation changes

  • savepoints continue to work like they do now:

   with connection.begin() as conn:
        with conn.begin_savepoint():
            ....

What's not clear is if there is still an explicit Transaction object. We may need to maintain this because in particular it's not clear we can just drop XA transactions as we see in #4608. #4608 needs to be addressed as part of this issue. I think we should likely keep XA mostly like it is after some consideration, but what's not clear is if we want it controlled from Connection or from a Transaction object. I think the Transaction object itself might be overkill.

  • more things I'm not thinking of at the moment
@zzzeek zzzeek added alchemy 2 goes along with the 2.0 milestone to aid in searching big job a major change that requires a lot of knowledge and work sql labels Apr 27, 2019
@zzzeek zzzeek added this to the 2.0 milestone Apr 27, 2019
@plannigan
Copy link

I have a clarification question about transactions and auto-commit. The intent to change the API such that transactions will always be explicit? Does this mean that the only way to interact with the database is from within a transaction? Or does it just mean that there will be fewer ways to interact with the database and it will be clearer when transactions exist because they are explicitly created?

@zzzeek
Copy link
Member Author

zzzeek commented Jul 1, 2019

I think what people don't realize is that there is no such thing as "no transaction". Everything on a database happens inside of a transaction, at best there is the notion of "autocommit", which means if you do something that modifies the database, some system in between knows how to "commit' the transaction automatically. "autocommit" is a concept that does not exist explicitly within the pep249 DBAPI we use. Over the years, DBAPIs such as psycopg2 and such have added support for "autocommit" modes of use that are extensions to the DBAPI, however they are seldom used except for special database commands that require it.

the very first versions of SQLAlchemy came from the common mindset that "a transaction" is an annoyance that should only be used for "certain things", however, years of user confusion over transactions has led me to see that SQLAlchemy's many different varieties of "how to run a statement" are meaningless to people, meaning, pretty much anytime I'm looking at any end-user code anywhere, no matter how mature it is, I am seeing misunderstandings of connections and transactions going on, or at the very least, a complete disregard for it. very common example:

   with engine.begin() as conn:
        my_table = Table("table", some_metadata, autoload=True)
        conn.execute(my_table.insert(), data)

The above code is someone sort of using transactions but then they for some reason are using an entirely separate transaction, as well as a separate connection checkout from the connection pool, to read in their table metadata. Given that we're in this time when people are flocking to asyncio because they don't like "implicit IO", it's puzzling that folks are readily having SQLAlchemy Core pull up multiple database connections and transactions beacause they don't realize or care that the autoload above is doing so.

I also think it's confusing and unintuitive that you can write code like this:

   with engine.connect() as conn:
        conn.execute(my_table.insert(), data1)
        conn.execute(my_table.insert(), data2)
        conn.execute(my_table.insert(), data3)

or like this:

   with engine.begin() as conn:
        conn.execute(my_table.insert(), data1)
        conn.execute(my_table.insert(), data2)
        conn.execute(my_table.insert(), data3)

and have completely different transactional behavior. I can't imagine who would want to check out a connection like the above, run three statements on it, and have them all autocommit.

The next really big problem with the "connect()" example is that it uses SQLAlchemy's own homegrown "autocommit" behavior, which is very surprising to just about anyone, but especially since folks don't usually know that the DBAPI itself doesn't have any autocommit, SQLAlchemy is doing it, and worse, SQLAlchemy is trying very hard not to autocommit every single SELECT statement since this is very expensive. So I have to answer this issue all the time:

    with engine.connect() as conn:
       conn.execute("my_special_stored_procedure(5, 6, 7)")

and then their stored procedure didn't write anything to the database and they are not too surprisingly completely mystified, their emails are always accompanied by "I ran the same statement in Microsoft SQL Studio and it worked" and then I have to break the news to them that SQLAlchemy is doing autocommit and it doesn't know anything about their procedure and here are the various ways you can make the above work, the simplest being to use "begin()" instead of "connect()". They say in politics if you are explaining, you're losing, and I sort of feel something similar happens here.

SQLAlchemy's early days were oriented around when I came out of the late-90s world of J2EE and Perl and the whole "for humans" thing was many years off. So the super wide variety of calling styles, attempts to make things "automatic if you don't want to deal with them" have all built up and a lot of the work I've had to do for ten years is slowly reverse all this thinking, so this is the next step.

hope this clarifies!

@jvanasco
Copy link
Member

jvanasco commented Jul 1, 2019

@zzzeek There are some databases that have an "autocommit mode" (maybe just one annoying, but popular, database "mysql"). What is the plan for handling that? Can that mode be detected so "start transaction;" is emitted with 'begin' statements?

@zzzeek
Copy link
Member Author

zzzeek commented Jul 2, 2019

@jvanasco we support those autocommit modes right now with the "AUTOCOMMIT" transaction isolation level. in this mode, DBAPI level connection.commit() and rollback() do nothing, and the DBAPI does not emit begin(). SQLAlchemy's perspective is that it knows none of this, you keep using the same begin() / commit() block that you always do and the fact that it isn't ACID is just an isolation level.

OTOH if you want SQLAlchemy to put the DBAPI into "autocommit" mode and then emit "BEGIN" / "COMMIT" explicitly, that's inventing a whole system to work around the DBAPI's design which is a losing proposition.

@jvanasco
Copy link
Member

jvanasco commented Jul 2, 2019 via email

@zzzeek
Copy link
Member Author

zzzeek commented Jul 2, 2019

since this was brought up, I still wonder if there needs to be one choice to make, since context managers are now standad, if you want your context manager to commit at the end or not, as a SELECT only transaction can be needlessly expensive on the commit side for some backends.

the choice would likely be, connect() is still there but just doesn't autocommit:

with engine.connect() as conn:
    conn.execute(...)
    conn.execute(...)
    conn.commit()  # if you want to, or not, just like Session()
    conn.execute(...)  # etc

then there's begin(), which autocommits, but you can still commit() or rollback() as you go if you wanted to


with engine.begin() as conn:
   conn.execute(...)
   # ...
   # commits

@jvanasco
Copy link
Member

jvanasco commented Jul 2, 2019

perhaps the docs could be more explicit with "connection" vs "transactional". IIRC,i think twisted's docs use txn for the name of a transactional connection

with engine.begin() as txn:
    txn.execute(...)

edit: while I think an experienced developer will understand the difference in your original example, it can sometimes be very beneficial to make things super obvious to the newer population.

@zzzeek
Copy link
Member Author

zzzeek commented Jul 2, 2019

i have to think if "txn" is more obvious or more confusing.

@plannigan
Copy link

There does seem to be some good discussion here. I'm in favor of reducing the number of ways to do the same thing, simplify the API and making it more explicit.

I think the two distinct entry points could work well.

  • With one you are starting a transaction and the context manager will do the right thing based on if the block completed normally or if there was an error.
  • With the other, you get a connection to the database that will write directly to the db with each execute(). If an error occurs, you are responsible for dealing with it. But since you explicitly decided to do this, you opted into the extra overhead you need to implement yourself. In this case, the context manager returns the connection back to the engine.

I also agree that using txn in documentation would help to enforce the concept with the reader about what is happening.

@zzzeek
Copy link
Member Author

zzzeek commented Jul 15, 2019

what does "write directly to the db with each execute()" mean ?

@plannigan
Copy link

Sorry. I meant the autocommit mode.

@zzzeek
Copy link
Member Author

zzzeek commented Jul 15, 2019

i dont think autocommit is very useful except for driver-specific use cases. "COMMIT" is often expensive on backends like MySQL even if nothing was changed. i'm tired of this being an invisible detail that people aren't aware of.

@plannigan
Copy link

I agree with the invisible details being a problem. The organization I'm in is currently working through issues related to engineers and data scientist not realizing that transactions are being opened by sqlalchemy. So transactions are being left open and idle for long periods of time and/or not properly committed. All causing performance problems with the database. I've also heard reports from others in the organization that the implicit transaction isn't playing nicely with stored procedures that run for a long time and/or mange transactions within the stored procedure.

Since I thought "write directly to the db with each execute()" was clear to mean autocommit, can you expand on how else that could be interpreted? I thought I was describing something similar to what you were saying in your "if there needs to be one choice" post, but maybe there is still some disconnect.

@zzzeek
Copy link
Member Author

zzzeek commented Jul 15, 2019

I agree with the invisible details being a problem. The organization I'm in is currently working through issues related to engineers and data scientist not realizing that transactions are being opened by sqlalchemy.

SQLAlchemy doesn't open transactions, the dbapi does.

when using the Engine object or the connection pool directly, SQLAlchemy always ensures that when the connection it returned to the pool, rollback() is called, so that it's intuitive that when you've "closed" the connection, even though it's still in memory, it isn't using transactional resources.

I would ask if these engineers are also having trouble with other kinds of resources, such as, if they open a file, or a persistent socket, etc., if they also are surprised that there is a resource used by this. this is kind of a basic programming task that context managers have made very easy which is why I want to push context managers as the primary pattern for getting data committed and opening connections.

I'm not doing as good of a job here at addressing the ORM however, which is likely what your coworkers are using. There is a context manager pattern with the ORM also but it's not as intuitive:

s = Session(e)
with s.transaction:
# work with s

But mostly what I'm aiming for in this issue is taking out autocommit at the Core level, which is a feature that the ORM does not use. ORM-level autocommit has been warned against for some time, and I'm hoping to take that out as well, but this should be less controversial.

So transactions are being left open and idle for long periods of time and/or not properly committed. All causing performance problems with the database. I've also heard reports from others in the organization that the implicit transaction isn't playing nicely with stored procedures that run for a long time and/or mange transactions within the stored procedure.

for stored procedures the DBAPI has a callproc() procedure which should be used, documented at: https://docs.sqlalchemy.org/en/13/core/connections.html?highlight=callproc#calling-stored-procedures

Since I thought "write directly to the db with each execute()" was clear to mean autocommit, can you expand on how else that could be interpreted?

the phrase "directly write" is ambiguous because there's several stages of data going to the database. In MySQL, even if you completed a commit, the data still might not be on disk, depending on how you have set flags like innodb_flush_log_at_trx_commit.

I thought I was describing something similar to what you were saying in your "if there needs to be one choice" post, but maybe there is still some disconnect.

@gordthompson
Copy link
Member

the DBAPI has a callproc() procedure

PEP 249 does state that callproc() is optional, so not all DBAPI layers implement it. Most notably, pyodbc does not have it, presumably because the specified mechanism for returning OUTPUT parameters is profoundly at odds with the way TDS sends data back from the server.

Connection gains a commit() method so commit-as-you-go can happen in the block

That's not how it stands at the moment.

with engine.begin() as conn:
    conn.execute(sa.text("CREATE TABLE #tmp (id int primary key)"))
    conn.execute(sa.text("INSERT INTO #tmp (id) VALUES (1)"))
    conn.commit()
    # Traceback (most recent call last):
    #   File "C:/Users/Gord/git/sqla-gerrit/gord_test.py", line 21, in <module>
    #     conn.commit()
    # AttributeError: 'Connection' object has no attribute 'commit'

As I mentioned in the last dev meeting, we currently need to use Engine.connect() to return a Connection, and then use Connection.begin() to return a Transaction.

with engine.connect() as conn:
    with conn.begin() as tran:
        conn.execute(sa.text("CREATE TABLE #tmp (id int primary key)"))
        conn.execute(sa.text("INSERT INTO #tmp (id) VALUES (1)"))
        tran.commit()
        # no error

Is that going to be the new usage pattern for commit-as-you-go?

@zzzeek
Copy link
Member Author

zzzeek commented Apr 5, 2020

the usage pattern for commit-as-you-go is documented at https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#library-level-but-not-driver-level-autocommit-removed-from-both-core-and-orm:

from sqlalchemy.future import create_engine

engine = create_engine(...)

with engine.connect() as conn:
    conn.execute(some_table.insert().values(foo='bar'))
    conn.commit()

    conn.execute(text("some other SQL"))
    conn.rollback()

the create_engine call is from "future" which will flag the Connection and/or use a specific Connection subclass that has the new behavior. specifically it will require "autobegin" behavior similarly to how the Session works in current master so that the begin() event is still called.

@zzzeek
Copy link
Member Author

zzzeek commented Apr 5, 2020

similarly, with autobegin, the transation is not started immediately, so this solves the isolation level issue:

with engine.connect().execution_options(isolation_level="AUTOCOMMIT") as conn:
  # ....

@gordthompson
Copy link
Member

from sqlalchemy.future import create_engine

Hmmm. Could it be that this is still WIP and hasn't been merged into master yet?

Traceback (most recent call last):
  File "C:/Users/Gord/git/sqla-gerrit/gord_test.py", line 6, in <module>
    from sqlalchemy.future import create_engine
ImportError: cannot import name 'create_engine' from 'sqlalchemy.future' (C:\Users\Gord\git\sqla-gerrit\lib\sqlalchemy\future\__init__.py)

@zzzeek
Copy link
Member Author

zzzeek commented Apr 5, 2020

from sqlalchemy.future import create_engine

Hmmm. Could it be that this is still WIP and hasn't been merged into master yet?

I haven't started working on it yet, so no.... ! :)

Traceback (most recent call last):
  File "C:/Users/Gord/git/sqla-gerrit/gord_test.py", line 6, in <module>
    from sqlalchemy.future import create_engine
ImportError: cannot import name 'create_engine' from 'sqlalchemy.future' (C:\Users\Gord\git\sqla-gerrit\lib\sqlalchemy\future\__init__.py)

@sqla-tester
Copy link
Collaborator

Mike Bayer referenced this issue:

Add support for per-execution execution_options; 2.0 execute https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/1848

@sqla-tester
Copy link
Collaborator

Mike Bayer has proposed a fix for this issue in the master branch:

Create initial 2.0 engine implementation https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/1848

@zzzeek zzzeek added engine engines, connections, transactions, isolation levels, execution options and removed sql labels Apr 8, 2020
@zzzeek
Copy link
Member Author

zzzeek commented Apr 16, 2020

A lot went on in this commit, including that I added execution_options to the execute() method, which is going to be important especially for the ORM stuff I'm going to do, which in turn changed the signature for the before_execute() / after_execute() events, which in turn caused me to modernize the "legacy event calling signature" code a bit, and then also the fact that we have two Connection / Engine / create_engine symbols now led me to spend about a week working on the Sphinx documentation to create new doc extensions and whatnot so that we can refer to these symbols with absolute module paths in the docs as well as that you can see the full module path when you're reading the docs for a function or method, since it is now easy to jump between "sqlalchemy.engine.Connection" and "sqlalchemy.future.Connection"; this is a temporary situation that won't exist in 2.0.

overall however the new transactional pattern is GREAT, it's very easy and intuitive to use and allows how things work on the inside to be much more straightforward.

@zzzeek
Copy link
Member Author

zzzeek commented Apr 16, 2020

@frallain
Copy link

frallain commented Feb 4, 2023

OTOH if you want SQLAlchemy to put the DBAPI into "autocommit" mode and then emit "BEGIN" / "COMMIT" explicitly, that's inventing a whole system to work around the DBAPI's design which is a losing proposition.

It seems possible with psycopg3 :

An autocommit transaction behaves more as someone coming from psql would expect. This has a beneficial performance effect, because less queries are sent and less operations are performed by the database. The statements, however, are not executed in an atomic transaction; if you need to execute certain operations inside a transaction, you can achieve that with an autocommit connection too, using an explicit transaction block.

@zzzeek
Copy link
Member Author

zzzeek commented Feb 4, 2023

most DBAPIs will let you emit the words "BEGIN" and "COMMIT" if you put them into autocommit mode. feel free to use that technique if you really need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alchemy 2 goes along with the 2.0 milestone to aid in searching big job a major change that requires a lot of knowledge and work engine engines, connections, transactions, isolation levels, execution options
Projects
None yet
Development

No branches or pull requests

6 participants