Figure 1. Netflix Example

  1. Generate a random noun (go.wisc.edu/su7013)
    This generator includes abstract nouns like "difference." Keep rolling until you get a concrete noun like "art" or "dog."
  2. Create a schema for your random word. Note the datatype for each field you list (number, text, or yes/no). Everytime you come across a more complex type (things we can't or shouldn't simplify to number, text, or yes/no), "grow" your schema out as we did on the board. Keep going until you have about 4~5 tables for this random word
  3. Do the same for one of your members' research topics, as you currently are thinking about it: Choose a noun at the "center" of your interests in this topic and grow out from there, stopping at 4~5 tables
  4. Give me a "green side up" when you're done 😊

go.wisc.edu/8y6794

from pandas import read_csv
sheet_id = "1RmkggBBkxytaQGigm4zSVJgBzMjTjGWy58yKExVuMWM"
sheet_name = "Example"
url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
example = read_csv(url)
print(example)
from pandas import read_csv
def get_data(sheet_name):
  sheet_id = "1RmkggBBkxytaQGigm4zSVJgBzMjTjGWy58yKExVuMWM"
  url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  return read_csv(url)

example = get_data("Example")
print(example)
from pandas import read_csv
from pandasql import sqldf

def get_data(sheet_name):
  sheet_id = "1RmkggBBkxytaQGigm4zSVJgBzMjTjGWy58yKExVuMWM"
  url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  return read_csv(url)

def sql(q):
  return sqldf(q, globals())

example = get_data("Example")
print(sql("""
  select *
  from example
"""))

SQL

Mariah A. Knowles

Tue, Jul 18, 2023

Orientation to SQL

  • Data is spread across multiple tables
  • SQL is a declarative programming langauge

Each Kid is responsible for one or more Dogs. Each Dog is the responsibility of a single Kid.

The Kids all share their Toys. Each Kid favorites one or more Toys. Each Toy is favorited by one or more Kids.

How would we diagram this?

  • What are our tables?
  • What are their relationships?
  • What fields might each table have?
  • What keys do we need to represent our relationships?
  • Find all kids' names
  • Find all kids' names and their birthdays
  • Find number of kids
  • Find all dogs' names and their kid's name
  • Find all dogs' names whose kid was born after Jan 1, 2009
  • Find all toys' names and how many kids favorite each toy
  • Find all pairs of kids who share a favorite toy

SQL Topics

  • select
  • from single table
  • limit
  • order by
  • count, calculations, as, sum, max, min, avg, distinct
  • where
  • in and like
  • from multiple tables, where vs. join on vs. join using
  • group by
  • having

Databases

from sqlite3 import connect
from pandas import DataFrame, read_csv

def query(q):
    with connect("example.db") as con:
        cur = con.cursor()
        results = cur.execute(q)

    return results

query("""
    create table if not exists
    movies (
        movie_id integer primary key autoincrement,
        title text,
        year integer,
        score real
    )
""")
def insert(values, q):
    with connect("example.db") as con:
        cur = con.cursor()
        cur.executemany(q, values)
        con.commit()

values = [
    ("Monty Python Live at the Hollywood Bowl", 1982, 7.9),
    ("Monty Python's The Meaning of Life", 1983, 7.5),
    ("Monty Python's Life of Brian", 1979, 8.0)
]

insert(values, """
    insert into movies
    (title, year, score)
    values
    (?, ?, ?)
""")
def select(q):
    with connect("example.db") as con:
        cur = con.cursor()
        results = cur.execute(q)
        df = DataFrame.from_records(
            data=results.fetchall(),
            columns=[desc[0] for desc in results.description]
        )
        
    return df

movie_data = select("""
    select * from movies
""")
query("""
  create table if not exists
  kids (
      kid_id integer primary key autoincrement,
      dob date,
      intake_date data,
      hair_color text,
      eye_color text,
      first text,
      last text,
      race text,
      height real,
      weight real,
      gender text,
      biological_sex text
  )
""")
def getSheet(sheet_name):
  sheet_id = "1RmkggBBkxytaQGigm4zSVJgBzMjTjGWy58yKExVuMWM"
  url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
  return read_csv(url)

kids = getSheet("Kids")
kid_data_to_insert = kids[["kid_id",
                          "dob",
                          "intake_date",
                          "hair_color",
                          "eye_color",
                          "first",
                          "last",
                          "race",
                          "height",
                          "weight",
                          "gender",
                          "biological_sex"]]
insert(kid_data_to_insert.values, """
    insert into kids
    (kid_id, dob, intake_date, hair_color, eye_color, first, last, race, height, weight, gender, biological_sex)
    values
    (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""")
insert(kid_data_to_insert.values, """
    insert or ignore into kids
    (kid_id, dob, intake_date, hair_color, eye_color, first, last, race, height, weight, gender, biological_sex)
    values
    (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""")

Note: These commands can look very different from one DBMS to another.

This one is most commonly written show tables.

select("""
    select * from sqlite_schema
""")

And this one is most commonly written desc movies.

select("""
    pragma table_info(movies)
""")

Links