Best database for storing and retrieving articles / blog posts / forum posts etc



  • I am trying to create a database to store articles, posts etc. They will usually contain a header column, content and some meta data like author or date time stamp, I assume each of these will be a different column. Content and headers could be of any length.

    My first instinct was to use PostgreSQL with varchar columns. But I was reading up on NoSQL and it seems like a better option for writing, sorting and retrieving these kinds of data, I was looking at mongodb and Cassandra.

    There will be a high write from data gathering and high read rate of the data for a recommender system.

    I am looking at aws as well, I saw there is documentdb(mongodb), keyspaces(Cassandra) and dynamodb. Which would be the best option for my case? Or Should I stick to PostgreSQL?



  • Please see https://dba.stackexchange.com/a/307412/150011 for further information, but the most relevant part from it is:

    When to choose a NoSQL database system over a RDBMS depends on a number of factors, none of which I would say primarily involve performance. But one of the main factors that is useful for determining when to choose which type of database system is based on the fact of if you have a well defined schema.

    One of the primary benefits of a NoSQL database is the lack of strict schema enforcement. This allows the developer to not have to manage the schema as much in the data layer and defers when they do need to concretely define it until the data is consumed in the application layer.

    The reason this is a benefit is because if you have a schema that changes very frequently, isn't well defined, or is outside of your control (such as ingesting data from external sources), then NoSQL gives you the ability to not have to manage it at the time the data is being stored in your database. Rather you can handle how you want to utilize a loose structure at the time the data is consumed, which is usually further down the stack (e.g. in the application layer).

    If you have a concrete schema, or one that will change at a rate that's tolerable for you to manage in the data layer, then a SQL (RDBMS) such as PostgreSQL would be just fine.

    The fact that you're able to articulate concrete objects (articles and posts) and properties of those objects (header column, content and some meta data like author or date time stamp) is a sign you have a somewhat concrete schema. Also in the comments you mentioned "It will always be the same, header, content, metadata wise." which is another good sign your have a concrete schema and could utilize a RDBMS.

    The problem is the length of each and the types of data we might potentially get

    This is generally a non-issue as most modern database systems (both SQL and NoSQL) have the capabilities to store wide column data. For example, PostgreSQL has https://www.postgresql.org/docs/current/datatype-character.html , which I believe VARCHAR(N) can roughly hold up to 1 GB (approximately 1 billion string characters) of data in a single row, and TEXT having no limit in the amount of data it can store. Some people do store raw image data in their relational databases though it's debated if that's generally good practice or not regardless of the type (NoSQL vs SQL) of database you choose - my opinion is to store the file path in the database to a file stored in a file system elsewhere.

    So it's up to you to decide how well defined your schema is, how often it's likely to change, and if you want to manage those changes in the database. But there is no "best database" in this scenario, they'll all perform relatively equal.

    You may find more useful information further reading https://dba.stackexchange.com/a/306896/150011 that goes more in depth on the details of concrete vs loose schema definition in the context of NoSQL vs SQL database choices.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2