DB strategies to store data in CSV files?



  • I'm new to the Postgres community and would love to hear your recommendations.

    To give a bit more context, each file is associated with a unique zip code and address and contains 8,760 (365 days x 24 hrs) rows of data. The files contain two columns.

    • timestamp
    • data

    I want to understand how I can store the data of each csv file in a database.

    The most obvious solution is to store the data from each csv file in a separate database table. But, I'm not sure how to retain the address and zip code if the database table is only going to contain two columns (timestamp and data).

    Would love to hear any other suggestions for DB implementations.



  • Almost every "Table-per-Thing" design is a Bad Idea.
    You almost always wind up having to join across [all] those tables and having lots of Tables in there makes life very difficult.
    And, if you ever have to change the table structure, say, to add a new column, you have to run the alter table lots and lots of times. Tedious.

    You should be looking to load all of these CSVs into a single table, keyed on zipcode, address and timestamp (or some combination(s) thereof, depending on how you need to retrieve this data), plus the data field.

    create table all_data 
    ( zipcode ... 
    , address ... 
    , timestamp ... 
    , data_value 
    , primary key ( zipcode, address, timestamp )
    , index( timestamp, zipcode, address )
    ); 
    

    You don't say what's in that data field, but if it's a big chunk of stuff that you might want to query parts of, consider breaking those parts out into individual fields as well. Yes, you can have "complex" data in columns these days, but there's [still] a cost (performance hit) to doing so.

    I'd suggest using an intermediate table to do the loading, and do some checks on the data as it comes in, before it gets loaded into the table proper.

    Even if you had a million such these files, you'd still only be looking at around 8 billion records - and that's nothing to a decently-sized PostgreSQL cluster.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2