· 6 years ago · Jun 16, 2019, 12:44 AM
1## Background
2
3NodeMCU firmware is an implementation currently based on the Lua51.4 core
4("**Lua51**"). This paper describes the rebaslining of the firmware implementatiom
5the current Lua 5.3.5 core ("**Lua53**"). This is a second iteration in the light
6of the experience gained by doing the LFS work and theROTables optimisation, and
7replaces my previous aborted port ("**Pilot53**") for reasons discussed below.
8
9## General Objectives
10
11My goals in doing this Lua53 port are:
12
13- The Lua implementation should be functionally a complete as possible. Features
14should only be omitted where there are clear ESP-specific reasons, especially where
15this follows precedent set in our current Lua51. An example here is the omission of
16the `os` library for target builds, as the majority of these OS services aren't
17available on a embedded ESP device, and this functional niche is largely replaced
18by the `node` library.
19
20- This Lua53 implementation adopts a minimum change strategy with respect to the
21Lua source code base where practical. An example here is that Lua51 replaces the
22`strXXX()` routines by the equivalent `c_strXXX()`. In Lua5.3, linker directives to
23achieve the same end without changing the source.
24
25- The Lua53 source directory is common source base that can compile to the host
26`luac.cross` and both the ESP8266 and ESP32 targets. Any target-specfic variations
27are handled through conditional compilation (`#ifdef` and `#define` macros).
28
29- The Lua51 and Lua53 code bases will be maintained in separate sub-directories
30(`app/lua` and `app/lua-53` respectively sitting within the current `dev` and
31`master` branches rather than it separate Git branches. The optional make parameter
32`LUA=5.3` will select a build with Lua based on `app/lua-53`, thus generating a Lua
335.3 firmware image. At some time in the future we will swap the default to Lua53,
34and move the 5.1 tree into frozen support.
35
36- Lua53 supports three mutuallly exclusive build targets which correspond to the
37defines `LUA_USE_HOST`, `LUA_USE_ESP8266` and `LUA_USE_ESP32`; the last two also
38define `LUA_USE_ESP`. `LUA_USE_ESP` is defined for a target `lua.c` firmware build
39and `LUA_USE_HOST` for a host-based `luac.cross` build. As with Lua51, the
40`luac.cross` build also includes a runtime execution environment that can be
41invoked with the `-e` option.
42
43- Where practical, the C application API for modules is common across the Lua51
44and Lua53 build environments. This means that the entire set of module libraries
45can be shared between the two build variants. The caveat here is Lua 5.3 introduces
46some core functional and C API changes, but where practical we implement Lua
47compatability modes to preserve the API. Where there are changes to the C modules
48library, where practical these implementation differences will be hidden from the
49modules at a source level by a wrapper NoceMCU macro API (as with the `LROT_*`
50macros), or in a few specific cases the current modules will be updated with
51`#ifdef LUA_VERSION_53` logic to handle the coding variants within a common source.
52
53- Lua53 fully supports both the NodeMCU LFS and ROTables patches. Note that any
54LFS images must be recompiled with the Lua53 `luac.cross` version as the VM code
55format has changed between the two verions. Also note that the ROTable
56implementation has also changed to reflect the VM changes but this is hidden at a
57source level in the `LROT` macros.
58
59- One notable architectural feature is that Lua 5.3 now implements **Integer** and
60**Floating point** as runtime subtypes of `Lua_Number`. There is therefore no
61advantage in having separate Integer and FLoating point build variants. Lua53
62therefore drops the `LUA_NUMBER_INTEGRAL` option. However we have replaced this
63with the option to use 32 or 64 bit numeric values with floating point numbers
64being stored as single or double precision respectively. Hence a 32-bit
65integer-only application will have similar memory use and runtime performance as
66existing Lua51 Integer builds.
67
68- The main driver for implementaion changes is to move constant data from RAM into
69Flash ROM, and thus to free up as much of the limited RAM for application use.
70
71- The other class of change is to avoid material runtime performace impacts of
72flash-based code execution. (For example 'hot' byte constant accesses are recoded
73to use word access + byte extract; this adds one instruction per access but avoids
74a non-aligned software exception.)
75
76- Many of the `eLua` features that we use in NodeMCU have now been incororated
77into core Lua 5.3. We have rewritten such as the ROTables implementation, so this
78new Lua code base drops any residual `eLua` code.
79
80- Currently no documentation exists for the Lua51 core API changes introduced to
81support the NodeMCU implementation. This Lua53 version includes a `manual.md`
82suppliment to the core Lua 5.3 `manual.html` which documents all API changes for
83module developers.
84
85
86## Detailed Implementation Notes
87
88### TString types
89
90The LFS implementation uses two types of TString resources: the standard Lua
91**RAM** strings that are bound at runtime into the `strt` string table; and
92LFS-based **RO** strings that are bound during LFS compile into the LFS `ROstrt`
93string table. Note that the third Pilot53 statically compilable RO TString type has
94been dropped, as implementing this will create runtime impacts that exceed any
95savings gained.
96
97Whilst running Lua application make heavy use of TStrings, the Lua VM itself makes
98little use of TStrings, except for two broad classes of TStrings constants:
99
100- Metatable fields such as `__index`. The (largely internal) Lua meta API calls
101which pass string constants such as `__index` now take the equivalent `enum TMS`
102integer constant, e.g TM_INDEX. These can be resolved via the `G(L)->tmname` vector
103with only one extra xtensa instruction overhead compared to statically compiled RO
104TStrings.
105```
106lauxlib.c: luaL_getmetafield(L, arg, "__name")
107lauxlib.c: lua_setfield(L, -2, "__name")
108lauxlib.c: luaL_getmetafield(L, idx, "__name")
109ltm.c: luaH_getshortstr(mt, luaS_new(L, "__name"))
110
111lauxlib.c: lua_setfield(L, -2, "__gc")
112loadlib.c: lua_setfield(L, -2, "__gc")
113
114lauxlib.c: luaL_callmeta(L, idx, "__tostring")
115lua.c: luaL_callmeta(L, 1, "__tostring")
116
117lbaselib.c: luaL_getmetafield(L, 1, "__metatable")
118lbaselib.c: luaL_getmetafield(L, 1, "__metatable")
119
120lbaselib.c: pairsmeta(L, "__pairs", 0, luaB_next)
121
122lbaselib.c: pairsmeta(L, "__ipairs", 1, ipairsaux)
123
124ldblib.c: lua_setfield(L, -2, "__mode")
125
126loadlib.c: lua_setfield(L, -2, "__index")
127lstrlib.c: lua_setfield(L, -2, "__index")
128ltablib.c: checkfield(L, "__index", ++n)
129
130ltablib.c: checkfield(L, "__newindex", ++n)
131
132ltablib.c: checkfield(L, "__len", ++n)
133```
134- **Note**. Need to add to `G(L)->tmname`: ` "__metatable", "__name", "__tostring", "__ipairs", "__pairs" `
135
136
137- The in-built Lua compiler generates keyword TStrings when compiling Lua source
138at runtime. These are bound at runtime using `luaL_loadstring()` calls, but the
139Lua53 VM also introduces a lookaside cache to avoid the runtime cost of hashing
140string and doing the `strt` lookup for this type of constant. Clearly this binding
141overhead only happens during source code compilation.
142
143- A new `dummy_comp_strings.lua` has been added to `lua_examples/lfs` which also
144includes all such TStrings used by Lua code parser. Compiling this into an LFS
145image instead of `dummy_strings.lua` will move these from the `RAM strt` to
146`ROstrt` and therefore reduce the peak heap requirements for code compilation. Note
147that using this has variant no benefit for applications that don't use on ESP
148compilation (e.g. pure LFS ones).
149
150### ROTables
151
152ROTables still use the `luaR_entry` vector used in our Lua51 implementation.
153However ROTables in Lua53 also have a `ROTable` varient of the standard `Table`
154header record. This enables some simplification of the Table / ROTable
155implementation compared to Lua51:
156
157- ROTables are no longer a separate Lua type. An (RW) Table is a Lua collectable
158object, so its `next` field is use to link the into the GC object hierarchy. Hence
159the `next` field point to another (word-aligned) GC object or `NULL`. ROTables are
160not collectable and hence the `next` field not used for GC. All ROTables have their
161`next` field set to `(GCObject *) 1` and hence the `lua_isrotable()` evaluates to
162`((size_t)(h->next)&1)` rather than a type selection or an `IN_RODATA_AREA()`
163address range check.
164
165- Now that all tables have a header record including a valid `flags` field, the
166`fasttm()` optimisations also work for ROTables and the ROTable / Table variant
167handling is largely hidden internally within `ltable.c`. (There is no separate
168`lrotable.c` file).
169
170- The `ROTable` structure varient drops unused fields to save space, but again
171this is handled internally within `ltable.c`.
172
173The same functionality and limitations apply to ROtables as with 5.1 to mininise
174migration impact for C module libraries:
175
176- ROTables can only have string keys and a limited set of Lua value types
177(Numeric, Light CFunc, Light UserData, ROTable, Nil). In Lua 5.3 `Integer` and
178`Float` are separate numeric subtypes, so `LROT_NUMENTRY()` takes an integer value.
179The new `LROT_FLOATENTRY()` is used for a non-integer values. This isn't a
180migration issue since non of the modules use floating point constants in declared
181ROtables, and the only on currently used is in `math.PI`. For 5.1 builds,
182`LROT_FLOATENTRY()` is a synonym of `LROT_NUMENTRY()`.
183
184- The same ordering limitations apply: `luaR_entry` vectors can be unordered
185except for any metavalues which must be ordered at the start of the vector. The
186ROTable lookaside cache effectively removes the overhead of vector scanning.
187
188
189### Proto Structures
190
191Standard Lua 5.3 contains a new peephole optimisation relating to closures. The
192Proto structure contains RW field pointing to the last closure created, and the GC
193now adopts a lazy approach to recovering these closures. When a new closure is
194created, then if the old one exists _and the upvals are the same_ then it is reused
195instead of creating a new one. This effectively treats
196```Lua
197 for i=1,n do
198 func1(a, func2(a,i)...end)
199 next
200```
201more like
202```Lua
203 local f2 = func2(a,i)...end
204 for i=1,n do
205 func1(a, f2)
206 next
207```
208that is the higher cost closure creation is done once rather than `n` times. This
209reduces runtime at the cost of some RAM overhead.
210
211- LFS relies on Protos being RO and this RW `cache` field breaks this assumption.
212
213- Closures exist past their lifetime, and this delays their GC. Memory
214constraind NodeMCU applications rely on the fact that dead closed upvals can be
215GCed once the closure is complete. This optimisation changes this behaviour. Not
216good.
217
218So Lua53 **_removes_** this optimisation.
219
220### Garbage collection
221
222Lua51 includes the eLua emergency GC, plus the various EGC tuning parameters that
223seem to be rarely used. The default setting (which most users use) is
224`node.egc.ALWAYS` which triggers a full GC before every memory allocation so the VM
225spends maybe 90% of its time doing full GC sweeps!
226
227Lua53 has adopted the eLua EGC but without the EGC tuning parameters. I have raised
228a separate issue to discuss this, but I recommend that we only extend the EGC with
229the functional equivalent of the `ON_MEM_LIMIT` setting with a negative parameter,
230that is only trigger the EGC with less than a preset free heap left. The runtime
231spends far less time in the GC and code will run perhaps 5× faster.
232
233### Build variants and includes
234
235As with Lua51, Lua53 heavily customises the `linit.c`, `lua.c` and `luac.c` files
236because of the demands on an embedded runtime. Given the amount of change, these
237will be pretty stripped down of the functionally dead code.
238
239... To be continued