Tools (e.g., SQL, command line)
Prev: System Design Next: Honors Class
Problems
24.1
What is merging in a version control system? Specifically, describe the limitations of line-based merging and ways in which they can be overcome.
24.2
What are hooks in a version control system? Describe their applications.
24.3
Your manager read a study claiming that a Python program took one-tenth the development time of a C++ program for the same application, and now demands that your team program exclusively in Python.
Tell him about five dangers of doing this.
24.4
Describe the advantages and disadvantages of Python and Java for polymorphism. In particular, how would you implement a polymorphic function in Python?
24.5
Argue that a build system such as make which looks purely at timestamps of source code and derived products when determining when a target needs to be rebuilt can spend more time rebuilding than necessary.
How can you avoid this? Are there any pitfalls to your approach?
24.6
Contrast ANT and Maven.
24.7
Contrast SQL and NoSQL databases.
24.8
What is database normalization? What are its advantages and disadvantages?
24.9
Write SQL commands to create a table appropriate for representing students, together with SQL commands showing how to add, delete, and update students, and how to search for students by GPA with results sorted by GPA.
Then add a table of faculty advisors, describe how to model adding an advisor to each student, and write a query that returns the students who have an advisor with a specific name.
Variant: What is a SQL join? Suppose you have a table of courses and a table of students. How might a SQL join arise naturally in such a database?
24.10
Explain what IP, TCP, and HTTP are. Emphasize the differences between them.
24.11
Describe HTTPS operationally, as well as the ideas underlying it.
24.12
Describe DNS, including both operational and implementation aspects.
Answers
24.1
Merging is the process of integrating concurrent changes made against a common ancestor revision. Plain line-based merging is fast and scalable, but it ignores syntax and semantics, so it can miss conflicts that still leave the program broken. Stronger approaches include syntactic or semantic merging, but in practice teams usually combine line-based merge with compilation and a small test suite to catch the remaining issues.
24.2
Hooks are executable scripts that a version control system runs at specific events such as pre-commit or post-commit. They are commonly used for formatting, linting, tests, policy checks, notifications, and issue-tracker integration. A key caution is that hooks should not silently rewrite a transaction in surprising ways.
24.3
The core objections are that one case study is anecdotal, team switching costs are real, some workloads are a bad fit for scripting languages, choosing one language for every task is usually the wrong abstraction, and language ecosystems evolve in ways that can create future migration risk. Tool choice should follow the job, not a single headline result.
24.4
Java gives stronger compile-time guarantees for polymorphism through interfaces and class hierarchies. Python is more flexible, but that flexibility shifts checks to runtime. In Python, the usual approaches are either interface-like mixins / base classes or plain duck typing, where a function simply calls the required methods and relies on the object exposing the expected interface.
24.5
Timestamp-only rebuild logic can overbuild when a source file changes without changing semantics, for example after reformatting or comment edits. One mitigation is to rebuild into a temporary artifact and replace the old artifact only if the contents differ. The catch is that this still has cost, and environment or toolchain changes can invalidate the assumption that unchanged sources imply unchanged outputs.
24.6
ANT is more imperative and script-like: you explicitly describe tasks and how to run them. Maven is more declarative and convention-driven: you declare project structure and dependencies, and Maven performs standard lifecycle steps for you. Maven is generally stronger on dependency management, conventions, and repository-backed builds; ANT is lower-level and more manual.
24.7
SQL databases are relational, schema-oriented, and typically stronger on joins, declarative querying, and ACID transactions. NoSQL databases usually trade some of that structure for simpler schema evolution, easier horizontal scaling, and data models such as documents or key-value stores. The right choice depends on whether consistency and relational querying or flexibility and scale are the primary concern.
24.8
Normalization is the process of decomposing relational data to reduce redundancy and update anomalies while preserving information. Its main benefits are cleaner data integrity and less duplication. Its main drawback is performance: recovering application-level records often requires extra joins.
24.9
A reasonable design uses a students table with a stable primary key and fields such as name, birthday, GPA, and graduation year, plus an advisors table keyed separately. The student row then stores an advisor_id foreign key. Basic INSERT, DELETE, UPDATE, and SELECT ... ORDER BY operations cover the CRUD portion, and the advisor-name query is a straightforward join between students and advisors.
24.10
IP is the low-level packet-delivery protocol across networks. TCP sits on top of IP and provides reliable ordered end-to-end transport with retransmission, flow control, and multiplexing via ports. HTTP sits on top of TCP and defines the application-level request / response protocol used by the web and many APIs. They operate at different layers.
24.11
HTTPS is HTTP carried over a secure channel, typically TLS. The security story has two main parts: key exchange and identity. Public-key cryptography is used to establish shared secrets, symmetric cryptography is then used for efficient bulk encryption, and certificates let the client verify that it is talking to the intended server rather than an impostor.
24.12
DNS is the distributed naming system that maps domain names to IP addresses. Operationally, clients query DNS servers, which answer from local data, cache, or by forwarding / recursively resolving the request up the hierarchy. Implementation-wise, DNS must handle enormous scale, continuous updates, replication, redundancy, and aggressive caching while keeping lookup latency low and the system highly available.