kpQwcfgd

· 6 years ago · Oct 20, 2019, 07:38 PM
1# Some Code Golf
2
3### The Problem
4I'm taking a course taught in C++. We were recently given an extra-credit problem of taking a string containing the characters `([{}])` (and possibly others) and verifying that the brackets are properly balanced - all opening braces are closed, in the correct order, and there are no extra braces.
5
6This is actually a relevant problem in practice for language implementers - my Scheme interpreter _has_ to contain a solution for this problem because Scheme's syntax completely depends on the balancing of brackets. In general, this is not a "hard" problem.
7
8Unless you try and do it in as few lines as possible.
9
10### The Rules
11
12My friend and I decided to tackle doing this problem in as few lines as possible. The rules were that semicolon separated statements, belong on two lines, such that
13
14```cpp
15foo = bar;
16return foo;
17```
18
19must be written on two lines. Additionally, opening braces must be the last character on their line, and closing braces must be the first character on their line. The first thing after a closing brace cannot be a statement, unless that statement is an else statement or an else if statement. This is pretty in line with normal style rules, although some might argue about the exception for if/else if. However what is _not_ in line with normal style rules is that lines can be arbitrarily long.
20
21The lines that are counted is everything between (and including) each function signature and its corresponding closing brace. We didn't count header includes, although perhaps we should've, but we seemed set on this early on.
22
23There is a three-line solution (the absolute minimum) under these rules. It's iterative, but it abuses a few tricks to jam the function body down into one line. The main post is not about this solution (although this is the winning solution), but rather about a different, 5 line solution I came up with for fun. I want to talk about that solution because it uses a non-trivial algorithm that I think is pretty neat, and then it uses a ton of cool tricks to cram it down into one line. Both solutions compile without warnings.[^1]
24
25I want to talk about the 3-line solution first as it demonstrates a number of the tricks which are useful in the 5-line solution
26
27### The Winning Solution
28
29Here's the 3-liner:
30```cpp
31#include <string>
32#include <stack>
33using namespace std;
34
35bool isValid(string input) {
36    for (stack<char> st;;) for (bool ret = false;;) for (bool err = false;;) for (auto c = input.begin(); c == input.end() ? ret = true : true; ++c, ret = err) if (ret || (((*c == '(' || *c == '[' || *c == '{') ? st.push(*c), false : true) && ((*c == ')' || *c == ']' || *c == '}') && (st.empty() ? err = true, false : true) && ((st.top() == (*c == ')' ? '(' : *c == ']' ? '[' : *c == '}' ? '{' : '\0')) ? st.pop(), false : true)))) return !err && st.empty();
37}
38```
39
40This single line could be shorter, but I found this version the easiest to manage mentally (as well as physically, as I wrote it with copy+paste). To break this down, let's look at an "un-golfed" version of the same code, and then show the transformations that bring it to this. We'll follow a similar process with the 5-liner.
41
42```cpp
43#include <string>
44#include <stack>
45using namespace std;
46
47char matchingBrace(char brace) {
48    if (brace == ')') return '(';
49    if (brace == ']') return '[';
50    if (brace == '}') return '{';
51    return '\0';
52}
53
54bool isValid(string input) {
55	stack<char> st;
56	for (auto iter = input.begin(); iter != input.end(); ++iter) {
57        switch (char c = *iter) {
58        case '(':
59        case '[':
60        case '{':
61            st.push(c);
62            break;
63        case ')':
64        case ']':
65        case '}':
66            if (st.empty()) return false;
67            if (st.top() != matchingBrace(c)) return false;
68            st.pop();
69        }
70	}
71	return st.empty();
72}
73```
74
75It's fairly evident what's going on here. We iterate through the whole string. We use a stack to manage the braces. If we ever see a closing brace and the stack is empty, the string is definitely invalid. And if we see a closing brace that doesn't match the opening brace, the string is also invalid. Otherwise we can remove the opening brace from the stack and keep going. When we reach the end of the string, the string is valid iff (if and only if) the stack is empty.
76
77This solution is long.
78
79The first way to save a bunch of lines is to inline `matchingBrace` into its callsite. This is easy with some ternary operators.
80
81```cpp
82bool isValid(string input) {
83	stack<char> st;
84	for (auto iter = input.begin(); iter != input.end(); ++iter) {
85        switch (char c = *iter) {
86        case '(': case '[': case '{':
87            st.push(c);
88            break;
89        case ')': case ']': case '}':
90            if (st.empty()) return false;
91            if (st.top() != (c == ')' ? '(' : 
92                             c == ']' ? '[' : 
93                             c == '}' ? '{' : '\0'))
94                return false;
95            st.pop();
96        }
97	}
98	return st.empty();
99}
100```
101
102Until we're done, for readability, I'll split up things which could be on one line into several lines. Once we're getting close, I'll note the actual line counts.
103
104Our next target is that pesky `break;`. It's semicolon-delimited, which means this has to take up a line. Switches like this one are easily translated into an if-else if. Notably, it's not an if-else, because of the implicit default case which does nothing.
105
106```cpp
107bool isValid(string input) {
108	stack<char> st;
109	for (auto c = input.begin(); c != input.end(); ++c) {
110        if (*c == '(' || *c == '[' || *c == '{')
111            st.push(*c);
112        else if (*c == ')' || *c == ']' || *c == '}') {
113            if (st.empty()) return false;
114            if (st.top() != (*c == ')' ? '(' : 
115                             *c == ']' ? '[' : 
116                             *c == '}' ? '{' : '\0')) 
117                return false;
118            st.pop();
119        }
120	}
121	return st.empty();
122}
123```
124
125Note that since we lost the opportunity to create a variable for the current char, we have to dereference the iterator everywhere instead.
126
127Next, since the bodies of the two `if` statements in the second case are the same, we can combine their conditions.
128
129```cpp
130bool isValid(string input) {
131	stack<char> st;
132	for (auto c = input.begin(); c != input.end(); ++c) {
133        if (*c == '(' || *c == '[' || *c == '{')
134            st.push(*c);
135        else if (*c == ')' || *c == ']' || *c == '}') {
136            if (st.empty() || (st.top() != (*c == ')' ? '(' : 
137                                            *c == ']' ? '[' : 
138                                            *c == '}' ? '{' : '\0')))
139                return false;
140            st.pop();
141        }
142	}
143	return st.empty();
144}
145```
146
147This is pretty good, but there are a bunch of braces we can't get rid of. There are a few things we'd really like to do here, but they seemingly are mutually exclusive:
1481) Get the body of the `else if` down to one line instead of two
1492) Merge the body of the `if` with the `else if`
1503) The body of the resulting `if` should only be one line so we can get rid of the braces on the `for` loop.
151
152These seem mutually exclusive, but they are not. In fact, doing any one of these without doing the other two is probably somewhat difficult. But we can do them together in one fell swoop, albiet it's a nasty one.
153
154The first key observation is that **expressions are statements**. What does this mean? In programming language semantics, there are 3 key things that the programmer writes: 
1551) control flow, like `if`, `else`, `for`,  `while`, and functions
1562) expressions, like `c == '('`
1573) statements, like `return c == '('`.
158
159Statements typically *include* expressions, and in fact `if`, `else`, and similar control flow are actually statements which typically contain several expressions. For example, the 'general case' of a return statement is `return <expression>;`, and the general case of an if statement is `if (<expression>) { <statements> }`. Note that the _body_ of the `if` contains _statements_, but the _condition_ of the `if` contains an _expression_.
160
161Why is this important? Together, the two `if` branches inside our for loop contain _three_ statements:
1621) `st.push(*c);`
1632) `return false;`
1643) `st.pop();`
165
166But the first and third of these statements are just _expressions_ - they are function calls with side-effects. We're asking C++ to call some function for us, and throw away the result (both of these functions are void, but that makes no difference), because the function changes the stack for us. But function calls are expressions - and the conditional of an `if` statement requires an expression. Similarly, the body of an `if` statement requires a statement, and we have a perfectly good return statement to feed it.
167
168But `st.push` and `st.pop` return void, and our `if` needs a boolean. Not only that, but we need to insure that the `if` only fires in the correct cases.
169
170To solve the first problem, we use an oft-ignored feature of C++: `,` is an operator. `<exp1>, <exp2>` evaluates `<exp1>`, then throws away the result, evaluates `<exp2>`, and returns that. This lets us get in our `push` and `pop` calls, and then return whatever boolean we need to make sure the `if` doesn't fire. What boolean would that be? Well, remember that `&&` and `||` do "short-circuiting" - if the left-hand operand of `&&` is `false`, then the right-hand operand is never evaluated. Similarly if the left-hand operand of `||` is `true`. We want to _stop_ the `if` from firing, rather than force it to fire, so we use `&&` and set it up to receive `false` when we're done. Then we just have to set it up to receive `true` when we want to carry on to the next thing.
171
172We have to decide if we're in a push/pop case before we check if we should return, otherwise we might decide we should return, and then decide we're not in a push/pop case, and we're using `&&` to combine these, so we would not return. That means we'll have to check for the `pop` case, rather than checking for the `return` case and then popping otherwise. This is easy enough - we just have to invert our character test. We check that the top of the stack is *correct* instead of checking if it's *wrong*. We can just throw an `!` on the whole thing and be done with it.
173
174(side node: `,` has the lowest precedence of all the operators, which means if the thing on either side of it is a valid expression, it will _always_ parse so that _everything_ to the left is the left operand, and _everything_ to the right is the right operand. So `true ? 'a' : st.pop(), 'b'` gets parsed the same way as `(true ? 'a' : st.pop()), 'b'`, instead of `true ? 'a' : (st.pop(), 'b')`)
175
176```cpp
177bool isValid(string input) {
178	stack<char> st;
179	for (auto c = input.begin(); c != input.end(); ++c)
180        if (((*c == '(' || *c == '[' || *c == '{') ? st.push(*c), false : true)
181            && ((*c == ')' || *c == ']' || *c == '}')
182                && !(st.empty() || (st.top() != (*c == ')' ? '(' : 
183                                                 *c == ']' ? '[' : 
184                                                 *c == '}' ? '{' : '\0'))))
185                    ? st.pop(), false : true)
186            return false;
187	return st.empty();
188}
189```
190
191(this is now down to 5 lines)
192
193Since the `if` only contains one statement, we can drop the braces. The whole `if` + `return` is itself a single statement, so we can also drop the braces on the `for` loop.
194
195The next goal is ambitious: we have two return statements. Can we merge them?
196
197The immediate answer seems like it should be "no." The second return is outside the loop. If it were inside the loop, surely we risk returning when we don't want to. Even worse, surely we risk _never returning_ (this would be in the case of an infinite loop).
198
199But we are clever, and we have tricks! If we remove the exit condition of the `for` loop (that's the `c != input.end()`), we get an infinite loop. Then we just have to arrange for the `if` to fire if (1) the loop _would have_ exited before this iteration, or (2) if it would've fired anyway in its current form. Case (1) will require an extra boolean to track if we should stop, and we can set it in the increment section of the `for` loop. A first attempt at this probably looks something like this:
200
201```cpp
202bool isValid(string input) {
203	stack<char> st;
204	bool ret = false; // should we return?
205	for (auto c = input.begin();; ++c, ret = c == input.end())
206        if (ret || (((*c == '(' || *c == '[' || *c == '{') ? st.push(*c), false : true)
207                && ((*c == ')' || *c == ']' || *c == '}')
208                    && !(st.empty() || (st.top() != (*c == ')' ? '(' : 
209                                                     *c == ']' ? '[' : 
210                                                     *c == '}' ? '{' : '\0'))))
211                    ? st.pop(), false : true))
212            return st.empty();
213}
214```
215
216However we'll notice a problem very quickly with this code: on the input `")"`, on the first iteration of the loop, `ret` is false, `*c` isn't an opening brace, `*c` _is_ a closing brace, and `st.empty()` is true! That means suddenly, we'll decide to return. But previously, we would've decided to `return false;`, and now we're deciding to `return st.empty()`. But the whole reason we're returning is because the stack was empty when it _shouldn't have been_, so we're going to return the wrong value.
217
218To fix this, we introduce yet another boolean to mark whether we're returning because of an error in the string or because we reached the end:
219
220```cpp
221bool isValid(string input) {
222    stack<char> st;
223    bool ret = false;
224    bool err = false; // error in the string?
225    for (auto c = input.begin(); c == input.end() ? ret = true : true; ++c, ret = err)
226        if (ret || (((*c == '(' || *c == '[' || *c == '{') ? st.push(*c), false : true)
227                    && ((*c == ')' || *c == ']' || *c == '}')
228                        && (st.empty() ? err = true, false : true)
229                        && ((st.top() == (*c == ')' ? '(' :
230                                          *c == ']' ? '[' :
231                                          *c == '}' ? '{' : '\0'))
232                            ? st.pop(), false : true))))
233            return !err && st.empty();
234}
235```
236
237(this has gone up to 6 lines, because of the added `bool`s. But I promise we can get rid of them!)
238
239Firstly, the loop is bit more complicated. There's a ternary in the condition section, and what is this `ret = true` business? In C++, **assignments are expressions**, and they evaluate to whatever was assigned. Here, the condition always evaluates to `true`, but evaluating the condition might set `ret`. So we effectively have an infinite loop, and we use `ret` to break out of it as before. This fixes _another_ bug in the previous version of the code. Can you figure out what it is? [^2]
240
241There is now a ternary for the first `st.empty()` check.  In this case, we assign `err` to `true`, and then evaluate to `false`. Evaluating to false short-circuits out of the `if`, since we don't want to try and test the stack top when the stack is empty (this can cause segfaults). After setting `err`, it would be safe to finish iterating through the string. When we get to the return, `err` would always be true and so we would return false. But we're not counting characters here, so we can update `ret = err` at every iteration of the loop so that we'll return quickly if this happens. This also avoids a risk of segfaults - but it turns out that it works either way.
242
243At this point we're pretty much done. "But there are still three variable declarations!" you cry. Indeed, there are. But here comes a _very_ neat trick. The `for` loop will _always_ return once execution reaches it, since it's an infinite loop and the only way out is returning. The trick now is to get that `for` loop into the same statement as the variable declarations.
244
245In C++, there are control flow constructs that let you declare variables inside them, before executing statements they contain. These are `if`, `for`, `switch`, and `while` (and functions, if you're adventurous with your thinking). `if (bool ret = false) <statement>` is perfectly valid. However, we need to be careful, because if the condition evaluates to `false`, or `0`, or `NULL`, the if won't fire. Additionally, variables declared in an `if` _must_ be able to be converted to a boolean in order to actually check the condition, which means for any type we could do this to, we have to actually make sure the `if` will fire.
246
247Instead, we can use the initializer of a `for` loop: `for (bool ret = false;;)` has the same effect, except the statement inside the "loop" will always execute. Here we rely on the fact that the inner statement always returns, otherwise this loop would be infinite.
248
249And now it all crashes down to one line (three lines by the rules):
250
251```cpp
252bool isValid(string input) {
253    for (stack<char> st;;)
254    for (bool ret = false;;)
255    for (bool err = false;;)
256    for (auto c = input.begin(); c == input.end() ? ret = true : true; ++c, ret = err)
257        if (ret || (((*c == '(' || *c == '[' || *c == '{') ? st.push(*c), false : true)
258                    && ((*c == ')' || *c == ']' || *c == '}')
259                        && (st.empty() ? err = true, false : true)
260                        && ((st.top() == (*c == ')' ? '(' :
261                                          *c == ']' ? '[' :
262                                          *c == '}' ? '{' : '\0'))
263                            ? st.pop(), false : true))))
264            return !err && st.empty();
265}
266```
267into
268```cpp
269bool isValid(string input) {
270    for (stack<char> st;;) for (bool ret = false;;) for (bool err = false;;) for (auto c = input.begin(); c == input.end() ? ret = true : true; ++c, ret = err) if (ret || (((*c == '(' || *c == '[' || *c == '{') ? st.push(*c), false : true) && ((*c == ')' || *c == ']' || *c == '}') && (st.empty() ? err = true, false : true) && ((st.top() == (*c == ')' ? '(' : *c == ']' ? '[' : *c == '}' ? '{' : '\0')) ? st.pop(), false : true)))) return !err && st.empty();
271}
272```
273
274And there we have it!
275
276### The 5 line Solution
277
278Well that was quite a lot, wasn't it? The 3-liner is pretty much interesting enough to be a post on it's own, but this 5 line solution has some really neat tricks that I think are worth showing. If you're still engaged, buckle in.
279
280**Note:** This 5-liner could be written in 3 lines, but it would get rid of the absolute coolest (and hardest to understand) trick that it (ab)uses. It would also require using _default parameters_, which would effectively change the API of the function. And if a user were to notice this and try and pass values for the extra parameters, the function would almost certainly segfault. And it's not possible to defend against this. Because of all that, I'm going to 'show off' the full 5-line version instead of the 3-line version. But I encourage an interested reader to try and figure out how to implement that final transformation.
281
282I'll write this part tomorrow.
283
284[^1] When I started writing this, the 3 line solution actually produced one warning. In the course of re-engineering the transformation from the original code to the compacted version, I noticed a way to get rid of this warning through careful management of the booleans involved in the last step.
285
286[^2] In the previous version, an empty string as input would make it to the second iteration of the loop. This would execute `++c`, moving `c` past `input.end()`. After that, the loop is pretty much undefined behavior. It seems like typically it causes segfaults, rather than incorrectly returning false somewhere.