Chat schema for PostgreSQL and MongoDB



  • Chat schema for PostgreSQL and MongoDB

    I'm designing a schema for a very large chat app. I had problems with NoSQL databases such as MongoDB and Cassandra. Some part of the schema is relational i decided to use both PostgreSQL and MongoDB together.

    Chats and members are stored in PostgreSQL and messages are stored in MongoDB.
    There is a table called events in both databases. i just keep chat_ids and creation time of the events in PostgreSQL and the data of events in MongoDB.
    These events include remove_message block_user , etc.

    Note that longtext means json

    Questions
    What problems do i face if i use both together?
    Is there anything to change?

    Updated

    Message schema

    {
      _id: ObjectId,
      user_id: Number,
      chat_id: Number,
      random_id: String,
      peer: {
        type: [Number],
        default: []
      }
      message: String,
      type: {
        type: String,
        enum: ["service-message", "text", "image", "video", "voice"]
      },
      reply_to: {
        message_id: ObjectId,
        user_id: Number
      },
      forwarded_from: {
        message_id: ObjectId,
        user_id: Number,
        chat_id: Number,
      }
      deleted_for: {
        type: [Number],
        default: []
      },
      author_signature: String,
      restriction_reason: String,
      number_of_forwards: {
        type: Number,
        default: 0
      },
      is_read: {
        type: Boolean,
        default: false,
      },
      is_deleted: {
        type: Boolean,
        default: false,
      },
      is_private: {
        type: Boolean,
        required: true
      },
      is_edited: {
        type: Boolean,
        default: false,
      },
      created_at: {
        type: Number,
        default: Date.now
      },
      updated_at: {
        type: Number,
        default: Date.now
      }
    }
    


  • This hopefully indirectly answers your question and points you in the right direction, it's too much to discuss in comments.

    MongoDB is not any more performant with writing data than any other mainstream relational database (PostgreSQL, SQL Server, etc). The same is true for reading data as well. NoSQL was not invented to be more performant with reading, writing, or querying data, rather it was originally created to be a tool to improve developer speed especially with a highly changing or variable schema. By not having to worry about maintaining an object structure both in the data layer in addition to the application layer, NoSQL databases allowed developers with specific development needs to release software changes faster when managing a highly changing schema.

    From what I can see, your messages entity is structured but perhaps the content of the message text is not (an example of the text would be helpful). Regardless you can either parse the text into a relational model or store it as a single text field just the same as your entity currently is modeled as, if you wanted to use a relational database system for all of your data.

    Sharding may be easier but at the general tradeoff of eventual consistency (two instances of the same table being out of sync at a given point in time). When to use Sharding depends on a couple of factors generally based on a function of what one is willing to pay to provision their servers vs how much data they need to store, based on their specific use cases. But in general, modern relational database systems handle big data well when architected properly, and I've yet to find Sharing any more useful than vertical scaling.

    All of that being said, I've worked with Messages data (financial trading messages) in a large and heavily transactional relational database as well. Some of the tables were into the 10s of billions of rows and multi-terabytes big for a single table. We stored our raw unparsed Messages in one table as they came into our system, but wrote an application that also parsed the relevant information from the Messages into a normalized table.

    The main two challenges you'll find by using two different databases systems is:

    1. Management of the data in two different systems, especially as your data grows: The management of a NoSQL database system requires different thought, planning, and attention than a relational database system.

    2. Relating the data across systems in a meaningful way: Consuming the data by your apps or for reporting purposes becomes more complicated and dependent on two unrelated systems. Associating the correlating messages data to your other data such as users becomes more complicated, and actually probably less efficient since you need two different methodologies for retrieving data which will result in more steps associating the data. For example if you wanted to get the users.full_name for the messages they created.

    Most modern relational database systems can store, parse, and query "unstructured" data too, since that's really a subset of structured data. For example, the JSON capabilities of PostgreSQL that Vérace mentions makes it a powerful data store for relational data and any additional meta-data that's unstructured as well.

    Alternatively, it seems like your messages structure is in your application's control. If that's true, then you don't need to store all as a single denormalized object. Rather you can store it into the database as a normalized series of tables that make up all of the data from the messages. It appears all of your data could be stored in a relational database.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2