Prevent creation of duplicate rows in database dynamically



  • I am new to databases so please forgive my mistakes. Here's the scenario. I use the web for my data source. Sometimes, like, for a keyword, what my crawler do is crawl same keyword repeatedly (not the crawler's fault).

    I don't want to include that particular keyword again in my database. I know of primary key and also know that no two primary keys can be same.

    So should I make my keyword primary key to avoid duplicate rows or is there some other methods to deal with it??

    The DB is dynamic because there is very little manual interference in the crawl.

    Note: I am using Apache Cassandra.



  • Having Primary Keys or Unique constraints are great options but you would need additional error handling for this and the cost of overhead might not be worth it for your needs.

    Another option would be an insert like this:

    IF NOT EXISTS (SELECT 1 FROM Words WHERE Word = @newWord) BEGIN INSERT INTO Words (Word) VALUES (@newWord) END

    That should do what you're hoping for and remain lightweight. Other option would be an on INSERT trigger to check same thing if you don't want to handle in your app logic and want it on DB side.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2