I'm keen to get a list of the feature limitations or suprises when using Synapse Analytics (a.k.a. Dedicated SQL Pool, used to be known as SQL DW / SQL Data Warehouse) compared to using normal SQL Server. Basically the things that you would expect to be there, but aren't.
Best posts made by Marcee
Latest posts made by Marcee
What are the limitations in Synapse Analytics compared to normal SQL Server?
EXISTS() vs EXISTS() = TRUE in Postgres
Faced weird behaviour with
EXISTS(also applies for
NOT EXISTS) generating different execution plans for
EXPLAIN ANALYZE SELECT * FROM books WHERE EXISTS (SELECT 1 FROM authors WHERE id = books.author_id AND name LIKE 'asd%');
QUERY PLAN Hash Join (cost=218.01..454.43 rows=56 width=40) (actual time=0.975..0.975 rows=0 loops=1) Hash Cond: (books.author_id = authors.id) -> Seq Scan on books (cost=0.00..206.80 rows=11280 width=40) (actual time=0.010..0.010 rows=1 loops=1) -> Hash (cost=217.35..217.35 rows=53 width=4) (actual time=0.943..0.943 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on authors (cost=0.00..217.35 rows=53 width=4) (actual time=0.942..0.943 rows=0 loops=1) Filter: ((name)::text ~~ 'asd%'::text) Rows Removed by Filter: 10000 Planning Time: 0.361 ms Execution Time: 1.022 ms
WHERE EXISTS(...) = TRUE
EXPLAIN ANALYZE SELECT * FROM books WHERE EXISTS (SELECT id FROM authors WHERE id = books.author_id AND name LIKE 'asd%') = True;
QUERY PLAN Seq Scan on books (cost=0.00..93887.20 rows=5640 width=40) (actual time=2.054..2.054 rows=0 loops=1) Filter: (alternatives: SubPlan 1 or hashed SubPlan 2) Rows Removed by Filter: 10000 SubPlan 1 -> Index Scan using authors_pkey on authors (cost=0.29..8.30 rows=1 width=0) (never executed) Index Cond: (id = books.author_id) Filter: ((name)::text ~~ 'asd%'::text) SubPlan 2 -> Seq Scan on authors authors_1 (cost=0.00..217.35 rows=53 width=4) (actual time=0.931..0.931 rows=0 loops=1) Filter: ((name)::text ~~ 'asd%'::text) Rows Removed by Filter: 10000 Planning Time: 0.298 ms Execution Time: 2.129 ms
Of particular interest is Hash Join vs. simple Seq Scan and the 2x time diff.
Database becoming unresponsive after time (4 to 24 hours)
Since a couple of days my mariadb database randomly becomes unresponsive after random intervals (which have been between 4 and 20 hours).
The database comes back to life when rebooting the ct.
And after some time these queries start to fail with
MySqlException (0x80004005): The Command Timeout expired before the operation completed.
I have also noticed that sometimes restarting the mariadb-service also gets stuck.
Below is all information I have on the issue:
Ubuntu: Ubuntu 22.04 LTS
Fresh install (vm) importing old data (only copying the database
fivem) using mysqldump
mysqlcheck --all-databases --auto-repair(while mysql is working. When it is in the unresponsive state, there's no output.
This is a screenshot of all information on the proxmox CT: https://i.imgur.com/nR6n3nt.png
Data and structure
It happens in a database called fivem which has multiple tables. Notably only the bigger ones are stuck (meaning that
select * from TABLE_NAMEwill not finish for a long time).
Phpmyadmin table (content): https://i.imgur.com/cUdhvs1.png
Phpmyadmin table (size and "is hung"): https://i.imgur.com/C6gxn1N.png
I have done these modification to the default config:
skip-name-resolve max_allowed_packet = 1G max_connections = 100000 table_cache = 640 thread_handling = pool-of-threads innodb_buffer_pool_size = 48103633715
SHOW ENGINE INNODB STATUS;was also unresponsive.
Shows multiple queries on mostly the states
Hastebin-Link to full processlist: https://www.toptal.com/developers/hastebin/sapajapohi.sql
Shows a lot of
Got an error reading communication packetsand
(This connection closed normally without authentication.
Note that these happen while mysql is operational too.
Some examples: https://www.toptal.com/developers/hastebin/rewozofana.yaml
Output while starting: https://www.toptal.com/developers/hastebin/uracoqutab.yaml
Restarting the mariadb service also did not work. The last warning it shows are: https://www.toptal.com/developers/hastebin/usewerugut.less
If anyone has ideas of what the issue might be or just ideas on how to further debug this I would really appreciate it.
Edit / Updates
- max_connections has been lowered to 250 but will need to be increased
- innodb_buffer has also been lowered
- It has become apparent that shortly after (or right before) the unresponsivness a lot of Disk IO is observed (20x the usual amount)
- mysql status shows some warnings the quickly increase in size ( https://i.imgur.com/TxMxC35.png ). I am not sure if these could be a cause and how to fix them for now
- I noticed that restarting the service (mariadb) occasionally got it stuck at
Waiting to flush the buffer poolwith the last messages being
May 19 08:11:06 database systemd: Stopping MariaDB 10.6.7 database server... May 19 08:11:06 database mariadbd: 2022-05-19 8:11:06 0 [Note] InnoDB: FTS optimize thread exiting. May 19 08:11:06 database mariadbd: 2022-05-19 8:11:06 0 [Note] InnoDB: Starting shutdown... May 19 08:11:06 database mariadbd: 2022-05-19 8:11:06 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool May 19 08:11:06 database mariadbd: 2022-05-19 8:11:06 0 [Note] InnoDB: Buffer pool(s) dump completed at 220519 8:11:06
RE: How to incrementally optimize a `COUNT(*) ... GROUPBY` query in PostgreSQL?
Why do you have
(event = 'view load' OR event = 'view:loaded')? Do those actually mean two different things, or do you just have dirty data with two difference spellings for the same meaning?
Your first query is just using the index as a skinny version of the table. Note that the
Index Conddoesn't include the leading column of the index, which for a btree index means it is the scanning the whole index and just applying the other column as an "in-index filter". This can be faster than scanning the table because the index might be much smaller than the table.
In your second query, this doesn't work well because one necessary column is not in the index, so it needs to visit the table anyway. A good index for the 2nd query would be
("timestamp",event,distinct_id)This index should also work for the first query even better than the current index.
But better yet would probably be cleaning your data so you don't need the OR. Or making a filtered index which is filtered on the OR condition.
Could MySQL perform better on a 10GB network storage with raid vs local HDD?
Compared to using a HDD on local computer and do many MySQL read / writes on it, could it have more queries per second if I install 10GBe network cards and I store the MySQL data on a raid network storage made of multiple HDDs ? Or would the network latency (or something else) be a problem for MySQL performance ?
My goal would be to get near SSD read/write performance without wearing out the SSD.
Finding rows where one field is equal and another differs
Using PostgreSQL 14.
I have two tables with sites, timestamps and some other information. I want to get all rows in table B having a site in table A but another timestamp. All rows in table A are also in table B and there are no duplicate with same site and date, so the situation is like this:
site | date ------------------ A | 2022-01-01 A | 2022-02-01 B | 2022-01-01 B | 2022-02-02 C | 2022-01-03 C | 2022-02-01
site | date ------------------ A | 2022-01-01 A | 2022-02-01 A | 2022-02-02 B | 2022-01-01 B | 2022-02-02 C | 2022-01-03 C | 2022-02-01 C | 2022-03-01
I want to run a query that gives
site | date ------------------ A | 2022-02-02 C | 2022-03-01
I tried to run
select B.site,B.date from B left join A on A.site = B.site where A.date != B.date;
But of cource gave me all A and C rows.
How do I say in SQL: Give me rows with dates that does not exist for a given station in the other table?
Both tables are around 1000 rows, so a slow query (within reasonable limits) is no disaster.
Why is total and target server memory never reaching the configured memory value?
I have sql server with 100GB ram, of which 90GB is allocated to sql server. 2 GB is currently free as shown in task manager.
90GB=92160MB (as seen in the sql server memory settings)
I am monitoring the total server memory and target server memory perf monitor counters.
I can see that the target server memory never crosses 90150MB, and the total server memory never crosses 89100MB. What could be the reason for this?
My understanding was that target memory should always be equal to the setting configured in sql server memory settings. But why is this different in my situation?
RE: Create new MySql user with same grants as existing user
Rather than copying the grants over, consider creating a "role" with suitable privileges. Then
RE: Error: User 'postgres' not found in aco_user table when trying to update a table
Simple use SET ROLE user to a user that has permission to alter that database (e.g. the owner).
RE: SQL DB2: I need to filter results based on query results
The condition "the employee shouldn't have 'disEnrolled' status on or before the given input date" needs to be done using a NOT EXISTS condition. You can't really express that with a "simple" condition in the WHERE clause:
I think the following does what you want:
select e1.* from employee e1 where e1.employee_id in (32456, 32458) and e1.status = 'enrolled' and e1.date e1.id and e2.employee_id = e1.employee_id and e2.status = 'disEnrolled' and e2.date
The first part of the WHERE condition selects those rows that fulfill the condition "employees who are enrolled on the given date or less then given date in the input"
The NOT EXISTS condition then applies the constraint "the employee shouldn't have 'disEnrolled' status on or before the given input date"