sheetwork
Contribute on GitHubChat on Discord
v1.0.0 Nicolas Jaar
v1.0.0 Nicolas Jaar
  • Introduction
  • Installation & Configuration
    • Installation & Update
    • Configuration
      • Set up your sheetwork project
      • Connecting to Google Sheets
      • Set up your sheetwork profile
  • Usage
    • Quick CLI
    • Using sheets.yml configuration
      • Basic Configuration
      • Column Operations & Advanced Controls
  • Sheetwork Operations
    • sheetwork init
    • sheetwork upload
    • Under the Hood Cleanups
  • FAQ
    • Do I need to list ALL the columns in my sheets config?
    • Do the column names need to be in the format of the original sheet?
    • What kind of cleanups or reformating does sheetwork do?
Powered by GitBook
On this page
  • What does sheetwork do under the hood?
  • Column Cleanups
  • Field/Content Cleanups

Was this helpful?

  1. Sheetwork Operations

Under the Hood Cleanups

Previoussheetwork uploadNextDo I need to list ALL the columns in my sheets config?

Last updated 4 years ago

Was this helpful?

What does sheetwork do under the hood?

Sheetwork has a few cleanups backed in that it applies to column names as well as to some of the content. For now, the only way to avoid these cleanups to be performed is to run sheetwork in , with the exception of which you can control in your sheets.ym file.

Column Cleanups

CamelCase to snake_case

Particularly handy for databases like Snowflake as this database forces column names to be UPPERCASE.

If your google sheet contains columns that look like this ColumnWithCamelCasing chances are your database client will make them look like `COLUMNWITHCAMELCASING`which isn't the prettiest thing...

Sheetwork will automatically reformat camel cased columns to something that looks like this: column_with_camel_case if you have enabled snake_case_camel: True in your .

Special Characters Removal

Another one that generally upsets your database client are special characters such as /, . etc. Sheetwork will convert those to underscores _

List of currently replaced characters in column names:

  • any character that is not a word (^\w\s regex), unless it's a whitespace character.

  • removes any trailing _ characters (at beginning or end of a column)

This looks a little arbitrary? Yes it is... We'd welcome input on this as people start using it on how we can maybe make it more up to the user via configurations. To be continued. If you have ideas feel free to .

Columns with Empty Names

Those are simply dropped, ingesting it would make it quite unpredictable if you expect to refer to later.

Columns are made case insensitive

We lowercase the column names.

Field/Content Cleanups

For now, we perform the following:

Trailing whitespace

This one is a pretty nefarious thing to have in your data and we couldn't refrain from sanitising it. Any trailing whitespace will be removed. That means spaces before and after a string.

Empty Strings

These are converted to your database equivalent for NULL.

We try not to mess with the content much, that's probably more of an ETL process that other tools (such as ) are best suited for.

dbt
sheet configuration
drop you thoughts in an issue
camelCase to snake_case transformations
--interactive mode