Under the Hood Cleanups

What does sheetwork do under the hood?

Sheetwork has a few cleanups backed in that it applies to column names as well as to some of the content. For now, the only way to avoid these cleanups to be performed is to run sheetwork in --interactive mode, with the exception of camelCase to snake_case transformations which you can control in your sheets.ym file.

Column Cleanups

CamelCase to snake_case

Particularly handy for databases like Snowflake as this database forces column names to be UPPERCASE.

If your google sheet contains columns that look like this ColumnWithCamelCasing chances are your database client will make them look like `COLUMNWITHCAMELCASING`which isn't the prettiest thing...

Sheetwork will automatically reformat camel cased columns to something that looks like this: column_with_camel_case if you have enabled snake_case_camel: True in your sheet configuration.

Special Characters Removal

Another one that generally upsets your database client are special characters such as /, . etc. Sheetwork will convert those to underscores _

List of currently replaced characters in column names:

  • any character that is not a word (^\w\s regex), unless it's a whitespace character.

  • removes any trailing _ characters (at beginning or end of a column)

This looks a little arbitrary? Yes it is... We'd welcome input on this as people start using it on how we can maybe make it more up to the user via configurations. To be continued. If you have ideas feel free to drop you thoughts in an issue.

Columns with Empty Names

Those are simply dropped, ingesting it would make it quite unpredictable if you expect to refer to later.

Columns are made case insensitive

We lowercase the column names.

Field/Content Cleanups

We try not to mess with the content much, that's probably more of an ETL process that other tools (such as dbt) are best suited for.

For now, we perform the following:

Trailing whitespace

This one is a pretty nefarious thing to have in your data and we couldn't refrain from sanitising it. Any trailing whitespace will be removed. That means spaces before and after a string.

Empty Strings

These are converted to your database equivalent for NULL.

Last updated