Postgresql DB - insert or update on table violates foreign key constraint + Schema Design Question



  • I have 1 big table with 9 fields/columns, some of theses columns have a lot of duplicates, so i decided to split them into 2 tables in Postgresql i.e. TEST_A table with columns that have a lot of duplicates & TEST_B table which has unique columns.

    I have no issues writing data into TEST_A table, but when i try to write into TEST_B table, Postgresql throws :

    insert or update on table "TEST_B" violates foreign key constraint "TEST_A_id_fkey"
    DETAIL:  Key (session_id)=(XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXX) is not present in table "TEST_A".
    
    CREATE TABLE TEST_A(
       id VARCHAR ( 100 ) ,
       column1 VARCHAR ( 100 ) ,
       column2 VARCHAR ( 100 ) ,
       column3 VARCHAR ( 100 ) ,
       column4 VARCHAR ( 100 ) NOT NULL,
       column5 VARCHAR ( 100 ) NOT NULL,
       PRIMARY KEY(id)
    );
    

    CREATE TABLE TEST_B(
    m_id VARCHAR ( 100 ) ,
    s_id VARCHAR ( 100 ) ,
    column11 TIMESTAMP,
    column12 TEXT,
    column13 TEXT,
    column14 TEXT,
    PRIMARY KEY(m_id),
    FOREIGN KEY(s_id) REFERENCES TEST_A(id) ON DELETE CASCADE ON UPDATE CASCADE
    );

    I removed the duplicate rows in the Table TEST_A, TEST_A initially had millions of rows, now it only has a few 1000's.

    When i try to write to TEST_B i.e. PostgreSQL DB, it will not allow me to write because it is not able to find the unique "id" in the TEST_A table, because it got deleted when we removed the duplicates in Table TEST_A. But unfortunately TEST_B table still has the old "id" still attached to the row.

    TEST_B "m_id" needs to track which TEST_A(id) they were a part of. It represents One to Many Relationship (1:M) i.e. Every TEST_B.m_id should have one TEST_A.id & every TEST_A.id should have multiple TEST_B.m_id's.

    I did some reading into Postgres DB documentation and I have used DEFERRABLE INITIALLY DEFERRED, it did not work.

    CREATE TABLE TEST_A1(
       id VARCHAR ( 100 ) ,
       column1 VARCHAR ( 100 ) ,
       column2 VARCHAR ( 100 ) ,
       column3 VARCHAR ( 100 ) ,
       column4 VARCHAR ( 100 ) NOT NULL,
       column5 VARCHAR ( 100 ) NOT NULL,
       PRIMARY KEY(id)
    );
    

    CREATE TABLE TEST_B1(
    m_id VARCHAR ( 100 ) ,
    s_id VARCHAR ( 100 ) NOT NULL,
    column11 TIMESTAMP,
    column12 TEXT,
    column13 TEXT,
    column14 TEXT,
    PRIMARY KEY(m_id),
    FOREIGN KEY(s_id) REFERENCES TEST_A1(id) DEFERRABLE INITIALLY DEFERRED
    );

    Insert query is the same :-

    INSERT INTO TEST_B1(m_id, s_id, column11, column12, column13, column14) VALUES ('1234', '2345', NULL, 'SAMPLE_VALUE_1', 'SAMPLE_VALUE_2', 'SAMPLE_VALUE_3');
    

    Result :

    ERROR:  insert or update on table "test_b1" violates foreign key constraint "test_b1_s_id_fkey"
    DETAIL:  Key (s_id)=(2345) is not present in table "test_a1".
    SQL state: 23503
    

    I'm currently using PySpark with SQLAlchemy to write to Postgresql DB.

    Any newbie suggestions/advice is appreciated. Thanks.

    Update : Added a sample INSERT QUERY =>

    INSERT INTO TEST_B(m_id, s_id, column11, column12, column13, column14) VALUES ('1234', '2345', NULL, 'SAMPLE_VALUE_1', 'SAMPLE_VALUE_2', 'SAMPLE_VALUE_3');
    

    Postgresql DB throws the following error on inserting the above mentioned row :-

    ERROR: insert or update on table "test_b" violates foreign key constraint "TEST_B_s_id_fkey"
    DETAIL:  Key (s_id)=(2345) is not present in table "TEST_A".
    SQL state: 23503
    

    s_id is the foreign key linked to id field of TEST_A table, so when (2345) is not present in table "TEST_A", it will throw foreign key constraint error.

    Any suggestions/advice is appreciated.



  • You insert a row into TEST_B. Its structure contains FOREIGN KEY definition:

    FOREIGN KEY(s_id) REFERENCES TEST_A(id) ON DELETE CASCADE ON UPDATE CASCADE
    

    This means that when you insert a row into TEST_B, the value which you insert into its column s_id must already present in TEST_A table, in its column id, or to be NULL which means that this particular row does not refer to a row in TEST_A.

    You insert a row with s_id = '2345'. There is no a row with this value in TEST_A. This causes an error.

    https://dbfiddle.uk/?rdbms=postgres_12&fiddle=4c759d309d46ab2f07fa370e13ea4a51 . Insertion into TEST_B attempt fails. But after insertion a row with id = '2345' into TEST_A the same insertion query executes successfully.


    I have used DEFERRABLE INITIALLY DEFERRED, it did not work. - Robin

    Mentioned option causes the constraint to be checked upon the commit.

    Without explicit START TRANSACTION each separate statement is separate transaction, and the constraint is checked during auto-commit which occures at the end of the statement execution.

    When the transaction is started explicitly then the constraint will be checked not at the end of statement execution but on commit.

    https://dbfiddle.uk/?rdbms=postgres_12&fiddle=ff0aff9c972526fe28db91ab020359bd




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2