-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
transaction / execution simplification for 2.0 #4644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have a clarification question about transactions and auto-commit. The intent to change the API such that transactions will always be explicit? Does this mean that the only way to interact with the database is from within a transaction? Or does it just mean that there will be fewer ways to interact with the database and it will be clearer when transactions exist because they are explicitly created? |
I think what people don't realize is that there is no such thing as "no transaction". Everything on a database happens inside of a transaction, at best there is the notion of "autocommit", which means if you do something that modifies the database, some system in between knows how to "commit' the transaction automatically. "autocommit" is a concept that does not exist explicitly within the pep249 DBAPI we use. Over the years, DBAPIs such as psycopg2 and such have added support for "autocommit" modes of use that are extensions to the DBAPI, however they are seldom used except for special database commands that require it. the very first versions of SQLAlchemy came from the common mindset that "a transaction" is an annoyance that should only be used for "certain things", however, years of user confusion over transactions has led me to see that SQLAlchemy's many different varieties of "how to run a statement" are meaningless to people, meaning, pretty much anytime I'm looking at any end-user code anywhere, no matter how mature it is, I am seeing misunderstandings of connections and transactions going on, or at the very least, a complete disregard for it. very common example:
The above code is someone sort of using transactions but then they for some reason are using an entirely separate transaction, as well as a separate connection checkout from the connection pool, to read in their table metadata. Given that we're in this time when people are flocking to asyncio because they don't like "implicit IO", it's puzzling that folks are readily having SQLAlchemy Core pull up multiple database connections and transactions beacause they don't realize or care that the autoload above is doing so. I also think it's confusing and unintuitive that you can write code like this:
or like this:
and have completely different transactional behavior. I can't imagine who would want to check out a connection like the above, run three statements on it, and have them all autocommit. The next really big problem with the "connect()" example is that it uses SQLAlchemy's own homegrown "autocommit" behavior, which is very surprising to just about anyone, but especially since folks don't usually know that the DBAPI itself doesn't have any autocommit, SQLAlchemy is doing it, and worse, SQLAlchemy is trying very hard not to autocommit every single SELECT statement since this is very expensive. So I have to answer this issue all the time:
and then their stored procedure didn't write anything to the database and they are not too surprisingly completely mystified, their emails are always accompanied by "I ran the same statement in Microsoft SQL Studio and it worked" and then I have to break the news to them that SQLAlchemy is doing autocommit and it doesn't know anything about their procedure and here are the various ways you can make the above work, the simplest being to use "begin()" instead of "connect()". They say in politics if you are explaining, you're losing, and I sort of feel something similar happens here. SQLAlchemy's early days were oriented around when I came out of the late-90s world of J2EE and Perl and the whole "for humans" thing was many years off. So the super wide variety of calling styles, attempts to make things "automatic if you don't want to deal with them" have all built up and a lot of the work I've had to do for ten years is slowly reverse all this thinking, so this is the next step. hope this clarifies! |
@zzzeek There are some databases that have an "autocommit mode" (maybe just one annoying, but popular, database "mysql"). What is the plan for handling that? Can that mode be detected so "start transaction;" is emitted with 'begin' statements? |
@jvanasco we support those autocommit modes right now with the "AUTOCOMMIT" transaction isolation level. in this mode, DBAPI level connection.commit() and rollback() do nothing, and the DBAPI does not emit begin(). SQLAlchemy's perspective is that it knows none of this, you keep using the same begin() / commit() block that you always do and the fact that it isn't ACID is just an isolation level. OTOH if you want SQLAlchemy to put the DBAPI into "autocommit" mode and then emit "BEGIN" / "COMMIT" explicitly, that's inventing a whole system to work around the DBAPI's design which is a losing proposition. |
The DBAPI isolation level sounds great! I didn't know this was handled there.
I have no opinions on this, I just wanted to ensure this was still compatible as some people still use autocommit databases (for whatever reasons).
|
since this was brought up, I still wonder if there needs to be one choice to make, since context managers are now standad, if you want your context manager to commit at the end or not, as a SELECT only transaction can be needlessly expensive on the commit side for some backends. the choice would likely be, connect() is still there but just doesn't autocommit:
then there's begin(), which autocommits, but you can still commit() or rollback() as you go if you wanted to
|
perhaps the docs could be more explicit with "connection" vs "transactional". IIRC,i think twisted's docs use
edit: while I think an experienced developer will understand the difference in your original example, it can sometimes be very beneficial to make things super obvious to the newer population. |
i have to think if "txn" is more obvious or more confusing. |
There does seem to be some good discussion here. I'm in favor of reducing the number of ways to do the same thing, simplify the API and making it more explicit. I think the two distinct entry points could work well.
I also agree that using |
what does "write directly to the db with each execute()" mean ? |
Sorry. I meant the autocommit mode. |
i dont think autocommit is very useful except for driver-specific use cases. "COMMIT" is often expensive on backends like MySQL even if nothing was changed. i'm tired of this being an invisible detail that people aren't aware of. |
I agree with the invisible details being a problem. The organization I'm in is currently working through issues related to engineers and data scientist not realizing that transactions are being opened by sqlalchemy. So transactions are being left open and idle for long periods of time and/or not properly committed. All causing performance problems with the database. I've also heard reports from others in the organization that the implicit transaction isn't playing nicely with stored procedures that run for a long time and/or mange transactions within the stored procedure. Since I thought "write directly to the db with each execute()" was clear to mean autocommit, can you expand on how else that could be interpreted? I thought I was describing something similar to what you were saying in your "if there needs to be one choice" post, but maybe there is still some disconnect. |
SQLAlchemy doesn't open transactions, the dbapi does. when using the Engine object or the connection pool directly, SQLAlchemy always ensures that when the connection it returned to the pool, rollback() is called, so that it's intuitive that when you've "closed" the connection, even though it's still in memory, it isn't using transactional resources. I would ask if these engineers are also having trouble with other kinds of resources, such as, if they open a file, or a persistent socket, etc., if they also are surprised that there is a resource used by this. this is kind of a basic programming task that context managers have made very easy which is why I want to push context managers as the primary pattern for getting data committed and opening connections. I'm not doing as good of a job here at addressing the ORM however, which is likely what your coworkers are using. There is a context manager pattern with the ORM also but it's not as intuitive: s = Session(e) But mostly what I'm aiming for in this issue is taking out autocommit at the Core level, which is a feature that the ORM does not use. ORM-level autocommit has been warned against for some time, and I'm hoping to take that out as well, but this should be less controversial.
for stored procedures the DBAPI has a callproc() procedure which should be used, documented at: https://docs.sqlalchemy.org/en/13/core/connections.html?highlight=callproc#calling-stored-procedures
the phrase "directly write" is ambiguous because there's several stages of data going to the database. In MySQL, even if you completed a commit, the data still might not be on disk, depending on how you have set flags like innodb_flush_log_at_trx_commit.
|
PEP 249 does state that
That's not how it stands at the moment. with engine.begin() as conn:
conn.execute(sa.text("CREATE TABLE #tmp (id int primary key)"))
conn.execute(sa.text("INSERT INTO #tmp (id) VALUES (1)"))
conn.commit()
# Traceback (most recent call last):
# File "C:/Users/Gord/git/sqla-gerrit/gord_test.py", line 21, in <module>
# conn.commit()
# AttributeError: 'Connection' object has no attribute 'commit' As I mentioned in the last dev meeting, we currently need to use Engine.connect() to return a Connection, and then use Connection.begin() to return a Transaction. with engine.connect() as conn:
with conn.begin() as tran:
conn.execute(sa.text("CREATE TABLE #tmp (id int primary key)"))
conn.execute(sa.text("INSERT INTO #tmp (id) VALUES (1)"))
tran.commit()
# no error Is that going to be the new usage pattern for commit-as-you-go? |
the usage pattern for commit-as-you-go is documented at https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#library-level-but-not-driver-level-autocommit-removed-from-both-core-and-orm:
the create_engine call is from "future" which will flag the Connection and/or use a specific Connection subclass that has the new behavior. specifically it will require "autobegin" behavior similarly to how the Session works in current master so that the begin() event is still called. |
similarly, with autobegin, the transation is not started immediately, so this solves the isolation level issue:
|
Hmmm. Could it be that this is still WIP and hasn't been merged into master yet?
|
I haven't started working on it yet, so no.... ! :)
|
Mike Bayer referenced this issue: Add support for per-execution execution_options; 2.0 execute https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/1848 |
Mike Bayer has proposed a fix for this issue in the master branch: Create initial 2.0 engine implementation https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/1848 |
A lot went on in this commit, including that I added execution_options to the execute() method, which is going to be important especially for the ORM stuff I'm going to do, which in turn changed the signature for the before_execute() / after_execute() events, which in turn caused me to modernize the "legacy event calling signature" code a bit, and then also the fact that we have two Connection / Engine / create_engine symbols now led me to spend about a week working on the Sphinx documentation to create new doc extensions and whatnot so that we can refer to these symbols with absolute module paths in the docs as well as that you can see the full module path when you're reading the docs for a function or method, since it is now easy to jump between "sqlalchemy.engine.Connection" and "sqlalchemy.future.Connection"; this is a temporary situation that won't exist in 2.0. overall however the new transactional pattern is GREAT, it's very easy and intuitive to use and allows how things work on the inside to be much more straightforward. |
It seems possible with psycopg3 :
|
most DBAPIs will let you emit the words "BEGIN" and "COMMIT" if you put them into autocommit mode. feel free to use that technique if you really need to. |
in 2.0 we seek to completely simplify Core execution and transaction control and this will also impact the ORM.
The ideas area discussed at https://gist.github.com/zzzeek/dd590d0a3485b7f19cab0322c48ece36 . Here is the theme:
When SQLAlchemy was written, Python was at 2.3. There were no context managers. Now there are! Everything is different.
So with this:
The only pattern is:
commit()
method so commit-as-you-go can happen in the blockrollback()
method which auto-begins in the block.The above two methods CANNOT BE CALLED OUTSIDE OF THE BEGIN STATE. That is, this will work very much like
Session
without autocommit.As part of this also, all autocommit is gone. From ORM, from Core, take it right out. It no longer applies since all transaction blocks are explicit. this will remove an enormous amount of complexity and conceptual overhead for users learning,etc.
we will need a vast number of documentation changes
savepoints continue to work like they do now:
What's not clear is if there is still an explicit
Transaction
object. We may need to maintain this because in particular it's not clear we can just drop XA transactions as we see in #4608. #4608 needs to be addressed as part of this issue. I think we should likely keep XA mostly like it is after some consideration, but what's not clear is if we want it controlled from Connection or from a Transaction object. I think the Transaction object itself might be overkill.The text was updated successfully, but these errors were encountered: