Text files processing



  • There is a task to process and convert files. All files are analog CSV(s with dividers)

    1. Two or more files. We need to glue them in some fields and give them one processed file.
    2. One file. The fields must be transformed, the sticker must be made from several lines to one etc., and a single, processed file must also be issued.

    Problem is, there's no single file structure. i.e. file files/samples of files may vary by field/structure

    To date, the idea of such processing is only one:

    1. We'll load the file/Files into some OBD (in-memory, for example), pre- runtime reading the file structure - we'll set up the structure tables.
    2. We're doing some SQL scrip, which is the result of which we're building a exit file.

    Question: Can there be some developed algorithms/biote for such conversion/processing to avoid the city of "a traveller"?

    UPD: In order to avoid attaching to the structure, of course, some NoSql can be used, but then we will lose full force of SQL



  • We can record the file in the table, for example with columns DOC_NAME, ROW_NUMBER, ROW_CONTENT

    DOC_NAME - file name ROW_NUMBER - row number in ROW_CONTENT - count the line from the file, e.g. Name; Name; Name; Road; Mestody - Petr; Semen; 12.12.1980;g. Moscow

    This line may then be dissolved by sql on the columns and placed in a pre-defined table:

    INSERT INTO PARSED_TABLE1
    SELECT           
    TRIM(SUBSTR(CON,0,INSTR(CON,CHR(59))-1)) NAME,    
    TRIM(SUBSTR(CON,INSTR(CON,CHR(59),1,1)+1,INSTR(CON,CHR(59),1,2)-
    INSTR(CON,CHR(59),1,1)-1)) SURNAME,    
    TRIM(SUBSTR(CON,INSTR(CON,CHR(59),1,2)+1,INSTR(CON,CHR(59),1,3)-
    INSTR(CON,CHR(59),1,2)-1)) BIRTHDAYDATE,    
    TRIM(SUBSTR(CON,INSTR(CON,CHR(59),1,3)+1)) BIRTHPLACE  
    FROM ( SELECT ROW_CONTENT CON FROM CONTENT_TABLE WHERE DOC_NAME="doc.csv")
    

    By analogy, we make a request to fill PARSED_TABLE2, from the 2nd tables, you glue a new one as you wish, you can stick it straight without PARSED_TABLE. CHR(59) is a symbol ; in ANSII

    This is also a good article on reading from .csv in Orasle OBD. http://torofimofu.blogspot.ru/2015/10/oracle-csv.html




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2