Adding soft delete to a database after having used hard delete



  • Introduction

    My app collects data from a centralized source where many different users can submit data about their organisation and their staff. Previously we used to just hard delete a users data when they were no longer relevant from the source of truth because it used to be reliable.

    But a change to some software the clients use, messes with everything. They now DELETE all their data multiple times per month when they submit data. This is by mistake and due to a terrible design. Which means they loose the data for the users in our system and have to re-enter parts of it.

    The software they use are stubborn and won't change the behaviour. We have tried educating the users about how to use it, but they don't learn. So now the last option is to soft delete the data for a cetain time period.

    Having looked at multiple Stack Overflow posts and blogs around the web, I don't really fancy any of the options, IE. add a column to the tables that need to be soft deleted. I started looking because that was my first instinct as well but don't really like it and the implications.

    I was wondering if you could give me some feedback on a different idea. I have no experience with maintaining soft deletion and I don't know if my thought is terrible.

    Diagram and relations Simple diagram to show some of the relations

    There is a user, their unique identifier is the same across multiple orgs. Per user affiliation with an org they have some userinformation like name, title etc. In our system they have one status row because it is the same in our app no matter what org they choose to connect as.

    So if I follow the conventional way, of adding columns for soft deleting I would have to add one to each of the unique tables that contains user data, because their affiliation to a certain org might be deleted but as a user they still live on in our system from somewhere else.

    But it seems like a hassle and a lot of change in the nitty gritty of my code to change things around to account for all these extra columns.

    Idea

    In my mind it would be simpler if I added a separate table containing:

    • UniqueUserIdentifier
    • UniqueOrgIdentifier
    • SoftDeleteDate

    And then whenever my app ask for data the api checks the new table; "is this person soft deleted from this org?" If true, they just block the request until they are restored if needed, or they will remain deleted until they are hard deleted within x hours of the soft deletion happening.

    Instead of having to change many queries and logic all over the place.

    Additional information

    The API uses EFCore as an ORM to connect to the database, incase that would help with any other smart fixes regarding its feature set. I have thought about creating custom savechanges logic, but couldn't come up with a good idea other than again adding a column to all the tables.

    Please let me know if you need any more information.

    Update

    J.D. Told me about row level security which made me look around. It seems very usefull, and it gave me some more insight to what I could search for.

    So I came across global query filters for EFCore which seems promising. It allows the context to filter on all queries and when you actually need to ignore this global filter, you can simply do it on a query by query basis.

    And it allows for dependency injection if you need to use something for the globalfilter that is based on the user that is connected. I created an answer based on this new information

    It also turns out what I really wanted was to deactiavte the row until eventual activation or hard delete instead of soft delete. I didn't know the correct way to express myself.



  • J.D. Mentioned Row Level Security in a comment and it seems very promising for people who need to do what I want but only have SQL to work with. If they post an answer I will give them credit for this question.

    Though it is possible to create migrations in EFCore that allows for custom SQL to be executed when the database is deployed, it felt forced. But still probably a valid solution

    https://docs.microsoft.com/en-us/ef/core/managing-schemas/migrations/operations

    https://docs.microsoft.com/en-us/ef/core/managing-schemas/migrations/managing?tabs=dotnet-core-cli

    https://docs.microsoft.com/en-us/sql/relational-databases/security/row-level-security?view=sql-server-ver15

    Based on the new terms in my vocabulary, I found global query filter from EFCore and I went with that.

    https://docs.microsoft.com/en-us/ef/core/querying/filters

    Though you do need to be a bit careful with how you set it up.

    Using required navigation to access entity which has global query filter defined may lead to unexpected results.

    But in essence it boils down to adding something like this to the code first configuration.

    modelBuilder.Entity().HasMany(b => b.Posts).WithOne(p => p.Blog).IsRequired();
    modelBuilder.Entity().HasQueryFilter(b => b.Url.Contains("fish"));
    modelBuilder.Entity().HasQueryFilter(p => p.Blog.Url.Contains("fish"));
    

    https://www.rubberchickenparadise.com/blog/2020-04-01-handling-row-level-security-with-entity-framework/ , where they discuss injecting user JWT token info into the context to allow personal filtering for an individual.

    public class Context : DbContext
    {
        private readonly IClaimsProvider _claimsProvider;
    
    private int UserId => _claimsProvider.UserId;
    private IEnumerable<int> AccessibleClientIds => _claimsProvider.AccessibleClientIds;
    
    public Context(DbContextOptions<Context> options, IClaimsProvider claimsProvider) : base(options)
    {
        _claimsProvider = claimsProvider;
    }
    ...
    

    }

    modelBuilder.Entity(entity =>
    {
    entity.HasQueryFilter(x => AccessibleClientIds.Contains(x.Id));

        entity.HasKey(x => x.Id);
        entity.HasMany(x => x.UserClientAccess)
              .WithOne(x => x.Client)
              .HasForeignKey(x => x.ClientId);
    });
    
        modelBuilder.Entity<UserOptions>(entity =>
     {
         entity.HasQueryFilter(x => x.UserId == UserId);
    
         entity.HasKey(x => x.Id);
     });
    

    https://spin.atomicobject.com/2019/01/29/entity-framework-core-soft-delete/ talks about overriding the default savechanges functionality to always soft delete, but this is not what I want, but some might find it helpful.

        public override int SaveChanges()
        {
            UpdateSoftDeleteStatuses();
            return base.SaveChanges();
        }
    
    public override Task SaveChangesAsync(bool acceptAllChangesOnSuccess, CancellationToken cancellationToken = default(CancellationToken))
    {
            UpdateSoftDeleteStatuses();
            return base.SaveChangesAsync(acceptAllChangesOnSuccess, cancellationToken);
    }
    
    private void UpdateSoftDeleteStatuses()
    {
        foreach (var entry in ChangeTracker.Entries())
        {
            switch (entry.State)
            {
                case EntityState.Added:
                    entry.CurrentValues["isDeleted"] = false;
                    break;
                case EntityState.Deleted:
                    entry.State = EntityState.Modified;
                    entry.CurrentValues["isDeleted"] = true;
                    break;
            }
        }
    }
    

    Eventual implementation

    I have all the relevant models inherit from Deactivateable to add the property/col to the database. Then because of the simplicity of the filter I simply created this config in the context:

    foreach (var entity in modelBuilder.Model.GetEntityTypes())
            {
                if (entity.ClrType.IsSubclassOf(typeof(Deactivatable)))
                {
                    var dateTimeOffsetDefaultValue = new DateTimeOffset(new DateTime(1, 1, 1, 0, 0, 0, 0, DateTimeKind.Unspecified), new TimeSpan(0, 0, 0, 0, 0));
                    modelBuilder.Entity(entity.ClrType, o =>
                    {
                        o.Property(nameof(Deactivatable.DeactivatedDate)).HasDefaultValue(dateTimeOffsetDefaultValue);
                    });
    
                var deactivatedDate = entity.FindProperty(nameof(Deactivatable.DeactivatedDate)).PropertyInfo;
    
                var parameterExpression = Expression.Parameter(entity.ClrType);
                var propertyExpression = Expression.Property(parameterExpression, deactivatedDate);
                var constExpression = Expression.Constant(dateTimeOffsetDefaultValue);
    
                var equalExpression = Expression.Equal(propertyExpression, constExpression);
                var filter = Expression.Lambda(equalExpression, parameterExpression);
                modelBuilder.Entity(entity.ClrType).HasQueryFilter(filter );
            }
    
        }
    

    Based on the inspiration from this https://github.com/dotnet/EntityFramework.Docs/issues/2820#issuecomment-803489089 made by https://github.com/stevendarby . I couldn't figure out where they got .Model.FindLeastDerivedEntityTypes from, so I simply chose to iterate the models myself and look at the subclass, based on an idea I saw in a related SO post, which I can no longer find and reference.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2