Why will there be 2 rows and not 1? The rollback is meant to take it back to the last COMMIT function which in this case is inputted after one INSERT
Why will there be 2 rows and not 1? The rollback is meant to take it back to the last COMMIT function which in this case is inputted after one INSERT
Is the dataset greater than RAM, but by less than 20 fold? Can you arrange it so that for one time period, you query mostly just one value of that column, then the next time period, mostly another value, and so on? If so, you should be able to get better cache usage.
If not, then it is hard to see where the increased performance would come from. Maybe seeing an actual example query would help.
So, long story short. There is a bug in xtrabackup that can be hit when meeting certain conditions on the database size/layout/qps; for those who meet it it's 100% reproducible (on the other hand this version/certain binaries can successfully work with other instances). In the same time I managed to find a thread on a chinese server with half-chinese/half-english discussion describing the root cause of this (no traces in the english part of the world though).
This bug has nothing to do with undo tablespaces metrics that can be seen. In fact, despite the zeroes, these tablespaces are fully functional.
I was unable to reproduce the issue with xtrabackup 8.0.26.
My experience with MS Access is similar in that it doesn't do well with multiple users. You're better off using an RDBMS that was designed for multiple users.
In the US, hospitals have legal requirements for storing and protecting patient data. I would take local regulations into account when picking out an RDBMS. Cost being one of the lower priority requirements. Row Level Security (RLS) would be high on my decision matrix.
IIRC - MS Access front end can connect to MS SQL (Express) RDBMS. It might be able to connect to other RDBMS via ODBC. So, all your UI work may not have been lost.
As you grow, you'll have multiple applications interacting with the data. You should plan on that happening. I do this by putting most/all of my data logic in the database (stored procedures, etc). The front-end technology is then chosen based on how easy it can handle this type of design.
Beyond that: "Which RDBMS should I use?" is off topic for this site.
NOLOCK results in an allocation order scan rather than an index order scan.
This is not correct. Using read uncommitted means the storage engine has a choice between allocation-order and index-order scans. It may choose either strategy at runtime without the execution plan recompiling.
NOLOCK doesn't block writes because it doesn't take shared locks on the table.
Reading data at read uncommitted means shared locks are not taken at the row, page, or table granularity. Since these locks are not taken, they will not block an exclusive lock needed by a concurrent transaction to change data.
During the NOLOCK's scan, since there are no shared locks, and a write (insert/update) happens prior to the point the scan has currently reached, then this situation will cause missing records.
This is not specific to read uncommitted. Reading data using locking read committed or repeatable read can also miss committed data.
https://docs.microsoft.com/en-us/archive/blogs/craigfr/read-committed-isolation-level normally releases a shared lock on a row just before reading the next row:
https://docs.microsoft.com/en-us/archive/blogs/craigfr/repeatable-read-isolation-level maintains shared locks to the end of the transaction, but only data actually encountered so far is locked. A row ahead of the current scan position can move behind the scan point if an index key value is changed:
prior images from Craig Freedman's posts linked inline
The issues described above are specific to index-order scans.
Similarly, during a write (insert/update), when page split happens after a row is read, and that row now is part of the next page (due to the page split), then this situation will cause duplicate records.
Index-order scans do not care about page splits. Pages are linked in logical key order both before and after the split. An index-order scan will encounter rows on the page that split, and the new page arising from the split. There's no way to avoid this if you are following pages in key order.
- As shown in my examples above, the missing/duplicate records problem is cause due to no lock and page splits. Correct? Can it also be caused due to allocation order scans?
No, those were examples of rows being missed or encountered multiple times due to rows moving in index order while an index-order scan is in progress.
Rows being missed or encountered multiple times due to page splits is an issue that can only happen when an allocation-order scan is used. When scanning in allocation order, a page split moves some rows to a new page. That new page may or may not be encountered by the allocation-order scan. If the page that split has already been encountered, we see some rows again. If the split page had not been encountered yet, we might miss some rows if the new page falls behind the allocation-order scan position.
Allocation-order scans are only possible without using read uncommitted if the engine has an acceptable guarantee that the data cannot change during the scan. Missing committed rows or encountering them multiple times due to an allocation-ordered scan over changing data is therefore specific to using read uncommitted isolation.
- If allocation ordered scan can indeed cause missing rows/duplicates, then I want to ask- suppose the engine had used an index order scan (I know that the engine won't do this but just assume for the sake of this question) rather than the allocation order scan, then how will it have solved the missing/duplicate rows problem?
Using an index-order scan avoids missing/duplicate rows due to page splits, as described. Missing or duplicate rows can still occur due to index key changes, also as described above.
Note again: Using read uncommitted does not guarantee you get an allocation-order scan.
Further reading:
I'm in the process of moving a postgres database to a new server.
I have number of large tables, partitioned by month, each contains several 10's of millions of rows.
I have a powershell script to do the following....
Copying the file from the archive takes maybe 10 minutes, loading it typically 40 minutes, then it moves onto the next partition. The problem is that the PowerShell console seems to occasionally get 'stuck' after loading a partition. Occasionally, a partition seems to be taking too long (say an hour) I check the windows task manager, and none of the processes seem to be doing anything. I check the dashboard in pgAdmin4 (no locks, most sessions idle). Then I check the record count for the partition being loaded - the record count seems to be static (still the same 5 minutes later). So as far as I can tell the partition has been loaded successfully, and the database is mostly idle. But the powershell console that's running the load session hasn't changed, then I shift focus to the powershell console, press return a couple of times and suddenly it bursts into life. The console tells me it's moved onto the partition, the task-manager indicates that the powershell process and postgres process are now running again, pgAdmin4 also indicates the database has sprung back into life.
The powershell script logs to screen using Write-host
and the current time between each step, so I always know what the most recent step was. My conclusion is that the interaction between powershell and psql is somehow 'broken' - powershell hasn't worked out that psql has finished and doesn't know to move onto the next step.
Other info - there's no sign of any errors (that I can spot) all the tables seem to get populated correctly (eventually) and the script works fine on all the smaller partitions. This only happens on about 1 partition in 10, the only cure seems to be pressing enter a couple of times.
This isn't a critical problem, occasionally hitting enter a couple of times is a viable solution, but an understanding of what might be going on (or even a solution) would be much appreciated.
You could compare like
a > b OR a < 0 AND b >= 0
An ORDER BY
clause could be
ORDER BY a < 0, a
(which works because FALSE < TRUE
).
Or you could do the smart thing and use the data type numeric
that allows you to store the numbers as they are.
I've moved several large tables (each with >10^9 rows and a couple of dozens columns) from clustered rowstore to clustered columnstore indexes on a SQL Server 2014 instance and noticed that statistics updates on those (default sampling, triggered in our ETL or from Hallengren scripts) now take significantly longer.
A more theoretical question is why is it so? My wild guess is that statistics update produces a lot of random reads which doesn't work well with the columnstore indexes, as they are more suited for sequential reads of large amounts of data. I would be happy to know a more "in-depth" explanation.
More important question is whether I can do something against it. I've tried my test case for a table with a single bigint column (see below) on a SQL Server 2017 instance with the same result. Incremental statistics seems to be on a paper a good solution. I would need to recreate all statistics objects (which are currently not incremental, probably because of historical reasons), extend the ETL logic and update our version of the Hallengren scripts (we currently use an old one). I would appreciate if someone would share his/her experience before I go down this rabbit hole.
Steps to reproduce:
/*Create a rowstore and a columnstore table with a single bigint column*/
CREATE TABLE dbo.rowstore (col1 BIGINT);
GO
CREATE TABLE dbo.columnstore (col1 BIGINT);
GO
CREATE CLUSTERED COLUMNSTORE INDEX CCI_columnstore ON dbo.columnstore;
GO
/Fill both tables with 400 * 10^6 rows. This results in a 15GB large rowstore and a 3,1GB large columnstore tables/
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 1010
e3(n) AS (SELECT 1 FROM e2 CROSS JOIN e2 AS b), -- 100100
e4(n) AS (SELECT 1 FROM e3 CROSS JOIN e3 AS b) -- 10000*10000
INSERT dbo.rowstore WITH (TABLOCK)
SELECT CAST(CAST(NEWID() AS VARBINARY(8)) AS BIGINT) FROM e4;
GO 4
INSERT dbo.columnstore WITH (TABLOCK)
SELECT * FROM dbo.rowstore
GO
/Trigger stats creation/
SELECT TOP 1 * FROM dbo.rowstore WHERE col1>0
SELECT TOP 1 * FROM dbo.columnstore WHERE col1>0
GO
SET STATISTICS TIME, IO ON
/This runs 1,5 seconds/
UPDATE STATISTICS dbo.rowstore
/This runs 8 seconds and becomes much slower than rowstore on really large tables/
UPDATE STATISTICS dbo.columnstore
Spanking brand-new installations of:
There are no other Visual Studio or SQL Server instances or installations.
I can look at my local SQL instance using SSMS but not SQL Server Profiler. I am using Server Name "." and Windows Authentication in both cases. Profiler is being launched both in SSMS and directly launched (in administrative mode or not), with the same results:
SQL-related services are in these states (enabling SQL Server Browser has no effect):
"Trust server certificate" does not help. I suspect the provider is extremely sloppy about the root cause of this response.
(Same system environment as https://stackoverflow.com/posts/70805318/ )
There is a DMV that breaks down this information by file per database:
SELECT * FROM sys.master_files WHERE database_id = DB_ID('YourDB')
https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-master-files-transact-sql?view=sql-server-ver15
I am using Postgresql 14 and it is set up for logical replication. Trying to stream changes with debezium version 1.8. Actually, everything seems OK. Every insert, update, delete is streamed except the updates that update a value with the same. According to https://dba.stackexchange.com/questions/118178/does-updating-a-row-with-the-same-value-actually-update-the-row great answer, this kind of update creates a WAL record but is it actually tried to replicate over publication?
Also need to mention that I asked the same question in debezium chat but they asked me back these two questions :
I would be happy if you can provide ways to validate those two.
I would ask - what is the the purpose of this HID?
If it constructed from database-internal, numeric, surrogate key values, then you should never be showing them to a User anyway, so what's the purpose of assembling them in this way?
Anyway ...
The most important question to ask before deciding how to store any piece of Data is this:
How are you going to use this Data?
How are you going to "get to" this Data? (Do you need to index it?)
What are you going to do with this Data, having got it? (Do you need to further subdivide it?)
This last is interesting, because one thing you might want to do its to find an entry's parent (e.g. Seattle -> Washington). But how can you do that efficiently, armed only with the complete HID? You'd need to dissect the value, extracting the last element (which is almost always more difficult than getting the first) and then go look up the parent record.
In such a case, it might be better to hold only the parent record's id and use a recursive query to derive the HID value completely dynamically.
If your id values ever changed (which, of course, they never, ever should) then your full HIDs would be all fouled up and would have to regenerated en masse and that's painful.
If i'm right, SQL Server Database mail uses System.Net.Mail to do the work, and SQL Server 2014 database mail is built for .Net 3.5.
The System.Net.Mail is able to send mail using TLS 1.2 when the build runtime version is 4.6 or above. So SQL Server 2014 database mail should not be able to support TLS 1.2 until some CU/SP.
Maybe patching the 3.5 .NET framework would fix your issue.
https://support.microsoft.com/en-us/topic/kb3135244-tls-1-2-support-for-microsoft-sql-server-e4472ef8-90a9-13c1-e4d8-44aad198cdbe
I'm not sure to understand your problem; please try this one:
UPDATE x
SET x.type= 'newid'
FROM Titem AS x
INNER JOIN TCharacter AS y ON x.char_id = y.id
INNER JOIN Titem AS z ON z.char_id = y.id
WHERE x.type = 'oldid'
and z.type = '2nd id';
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=9ee15cc0ad794b3f8da13da219018d94
maybe you can add some data to dbfiddle...
The https://www.brentozar.com/pastetheplan/?id=SkT72_WRY show merge joins with full scans so all rows are touched. The culprit is the table variable in the dbo.spPurge
procedure:
declare @RedundantInsertionIDs table (InsertionID bigint, UpdateTime DateTime)
This lacks an index to help optimize the delete queries. I observed that a primary key on the table variable https://www.brentozar.com/pastetheplan/?id=r1BGtvZAt such that the target tables are accessed with a seek instead of scan, touching only those rows to be deleted rather than the active rows too.
declare @RedundantInsertionIDs table (InsertionID bigint primary key, UpdateTime DateTime);
If you still experience issues, a temporary table instead of table variable may help. You could also implment a retry in the purge proc as a last resort.
The answer is described mostly already https://gitlab.com/dalibo/postgresql_anonymizer/-/issues/161 . I'll add it here as well to make it a bit easier to find and understand.
## clone the repo en cd into it
git clone https://gitlab.com/dalibo/postgresql_anonymizer.git
cd ./postgresql_anonymizer
Checkout the version that allows standalone installation
git checkout 0.8.1
Create the anon_standalone.sql file
make standalone
Execute that sql file against your Amazon RDS instance
psql -h -U -d -a -f anon_standalone.sql
SQL Server availability group backup preferences.
We have 3 nodes for our AG.
I would like to take full and differential backups on primary, log backups on secondary.
When I configured "any replica" made backup priority 50 for all replicas log backups aren't running on secondary replica.
If I do prefer secondary full backups wont run on primary.
How can I configure to take full/diff on primary only and log backup on secondary only?
The reason we wanted to offload backups to secondary replica is to improve performance on primary during peak hours.
Our current setup takes full every 3 days and diff everyday after hours, we’ve log backups running every 15mins. It’s through SQL Agent job with custom SP.
I'm passing a JSON string to a stored procedure in order to insert warehouse inventory.
Sometimes, there will be multiple line items with the same product, going to the same shelf, on the same pallet (LPN), with the same date. I would like to aggregate those into one line in the table.
The JSON represents individual lines that have been received in, and put away into a location. Here there are 2 identical items:
[{"StockCode": "ABC123", "Qty": "200", "Bin": "E4B4_L", "LPN": "1234", "PutDate": "2022-01-21 18:35:53 UTC"}, {"StockCode": "ABC123", "Qty": "400", "Bin": "E4B4_L", "LPN": "1234", "PutDate": "2022-01-21 18:36:43 UTC"}]
So:
ABC | 200 | 2022-01-21 00:00:00.000 | LPN1234
ABC | 400 | 2022-01-21 00:00:00.000 | LPN1234
Should aggregate to:
ABC | 600 | 2022-01-21 00:00:00.000 | LPN1234
I tried to GROUP BY and SUM on the qty but the resulting lines in the table are still 2 separate lines. I realized the PutDate were not the same due to the timestamp, so I thought for sure casting it to DATE would solve it, but it did not.
SQL script:
ALTER Procedure spc_PutAway
(@json NVARCHAR(MAX) = '')
AS
BEGIN
INSERT INTO warehouse (Bin, StockCode, Qty, PutDate, VerDate, LPN)
SELECT Bin, StockCode, Sum(Qty) as Qty, CAST(PutDate AS DATE) as PutDate, GETDATE() as VerDate, LPN
FROM OPENJSON(@json)
WITH (
Bin VARCHAR(20) '$.Bin',
StockCode VARCHAR(30) '$.StockCode',
Qty DECIMAL(18,6) '$.Qty',
PutDate VARCHAR(20) '$.PutDate',
LPN VARCHAR(50) '$.LPN'
)
WHERE Bin <> 'DEFAULT'
GROUP BY StockCode, Bin, PutDate, LPN
END
Results in table:
Bin StockCode Qty PutDate VerDate LPN TransID VerCol
E4B4_L ABC123 200.000000 2022-01-21 00:00:00.000 2022-01-21 10:52:43.823 1234 1 0x000000000000275D
E4B4_L ABC123 400.000000 2022-01-21 00:00:00.000 2022-01-21 10:52:43.823 1234 2 0x000000000000275E
I need to backup a postgres DB which is located on a remote server. This server also hosts a lot of other stuff, and that's why I don't have a general access to the server itself. Therefore I believe that I can not use ssh
to access the remote server and run pg_dump
from there. (Please correct me if I am wrong -regarding pg_dump
) The only thing I can do is write and read the DB via a DB connection (which is allowed through the firewall.)
What is the best approach of backing up such DB?
I tried pg_dump
, but I don't think it'll work under such conditions.
We want to migrate our SQL Server 2012 on server 2012R2 to SQL Server 2019 on server 2016 and keep the ip addresses but change hostnames.
i can not change the name of listener and ip addresses from application code. We have 10 TB data. We have been using single availability group.
Current situation :
LISTENER : SQLLSN : 10.10.12.12
SQL_OLD1 - 10.10.12.2
SQL_OLD2 - 10.10.12.3
SQL_OLD3 - 10.10.12.4
Windows Server 2016 - SQL Server 2019
SQL_NEW1 - 10.10.12.5
SQL_NEW2 - 10.10.12.6
SQL_NEW3 - 10.10.12.7
After migration:
LISTENER : SQLLSN : 10.10.12.12
SQL_NEW1 - 10.10.12.2
SQL_NEW2 - 10.10.12.3
SQL_NEW3 - 10.10.12.4
Any information and suggestion would be very appreciated.