· 5 years ago · Dec 13, 2019, 09:59 PM
1<|startoftext|>
2Every company is seeing its volume of data growing at an exponential rate, and it seems there is no way to reduce the quantity of data we rely on everyday. But at the same time, we have to extract value from this data, in order to optimise, improve and accelerate our business and the way we work. To do so, a large quantity of data must be stored, computed and enhanced, for which specific solutions are needed. In concrete terms, large databases, distributed databases, big data clusters and other resource-intensive workloads require servers with high-performance storage devices, designed to deliver read/write operations at optimal speeds.
3
4At OVHcloud, we love pragmatic solutions. In that spirit, some months ago, we started to offer GPUs in our Public Cloud, i.e. providing virtual machines with GPUs. But GPU virtualisation is not currently able to offer the level of performance we demand, so we chose to link the GPUs directly to the virtual machines, avoiding the virtualisation layer. KVM – our Public Cloud’s hypervisor – uses libvirt, which has a PCI passthrough feature that turned out to be exactly what we needed for this purpose.
5
6In order to provide the best storage performance, we worked with a number of our customers on a PoC that used the same PCI Passthrough feature to incorporate the fastest storage device into our Public Cloud instances: NVMe cards with 1.8TB of space.
7
8When it comes to storage and customer data, we have to be sure that when a customer deletes and releases an instance, we properly clean the device before allocating it to another instance. In this case, we patched OpenStack Nova in order to conduct a full erase of the device. In a nutshell, when an IOPS instance is released by a customer, it’s pushed to quarantine, where internal tools will run the required erase actions on the device. Once it’s done and checked, the device and the instance slot are pushed back in Nova as “available”.
9
10Let’s jump into some concrete examples and take the time to appreciate the awesome speed of these new instances! We’ll use the biggest instance model and run an I/O bench on a RAID 0. This way, we will see what the limits are when we aim for the fastest storage solution on a simple Public Cloud instance.
11
12
131,710K IOPS on the write operation… Imagine what you could do with such a solution for your databases, or other highly-intensive, transactional use cases.
14
15Of course, we’re presenting an optimal scenario for this example. RAID 0 is inherently risky, so any failure on one of the NVMe devices can corrupt your data. This means you absolutely must create backups for your critical data, but this in itself opens up a lot of new possibilities. So we’re 100% sure that your databases will love these instances! You can find more details about them on our Public Cloud website.
16<|endoftext|>
17<|startoftext|>
18One of OVHcloud’s successes is our ability to develop and promote innovation, in both IT and industrial practices. For two decades, we have placed innovation at the centre of our strategy, it’s part of our DNA. We are constantly researching and developing new technologies to optimise the performance of our services.
19
20We manufacture our own servers, we build our own datacentres and we maintain strong, long-term relationships with other technological partners, with a clear objective: to deliver the most innovative solutions with the best price/performance ratio. The most obvious example, and one closely connected to our development, is the idea of using water to cool our servers.
21We began to use water cooling at industrial scale in 2003, even though it went against the conventional wisdom at the time. This technology has allowed us to consistently increase servers’ performance while reducing energetic consumption in our data centers. The challenge was not only to find the right relation between pressure, flow rate, temperature and pipes diameter, but above all to manufacture the solution on a mass scale.
22
23The environmental challenges associated with digital services are of vital concern too, especially for our datacentres. We are very aware of our own environmental impact and we constantly strive to reduce it on a daily basis. After power, cooling servers is usually the most costly activity, electricity-wise, for hosting providers.
24
25Water cooling, combined with an outside air-cooling system, allows us to greatly optimise the Power Usage Effectiveness (PUE) of our datacentres. And less electric consumption means fewer costs for us and our customers, but also a reduced impact on the environment.
26Since 2003, we have been developing our own water-cooling system and deploying it in our new datacentres. Our historic customers were the first to benefit from a groundbreaking and highly-effective process on an industrial scale.
27Our first generations of water-block technology, were designed by our teams, and manufactured externally. These water blocks had an optimal performance of 60W at 30°C water temperature.
28
29The first generation of water cooling used very simple water blocks, with two copper convex ends crimped together:
30
31Water block consisting of 2 copper convex ends crimped together
32
33On the next iteration of our water blocks, we added some changes to improve reliability and reduce costs:
34
35 - The crimping technology is replaced by brazing
36 - Stainless steel push-in fittings replaced the brass-plated ones
37 - A cross is also added to the cover to better fix the water block onto the chip
38
39
40In order to facilitate pipe fitting and improve sealing, we again changed the water block technology by introducing compression fittings, which are more reliable and easier to connect.
41
42
43Unlike other providers we maintain complete control of our value chain which means that we can offer cutting-edge solutions at a very competitive price and this is very important to our customers.
44
45
46During this period, we still designed our water blocks internally and manufactured them externally. The optimal performance for this generation of water blocks is 60W with a water temperature of 30°C.
47
48Our water blocks continued to evolve. We replaced the copper convex end base plate with a simple plate. The cross on the cap was replaced by a cross inside the water block. This allowed us to further reduce the cost of the water blocks without impacting performance.
49
50
51We always try to be innovative and forge our own path. Since 2013, we have continuously rethought our water-cooling technology, improving performance, operation and cost. This constant innovation allows us to keep up with our customers’ ever-growing demands for increased computing and data-storage capacities, and the ever-increasing amount of heat generated.
52
53The water blocks created in this period were completely different from earlier generations. We replaced welding by screw-based tightening, and the convex top plates were replaced with plates with integrated water inlets and outlets.
54
55At this stage, we began to make water block variations, for example adapting them to the smaller GPU form-factors.
56
57Here you have a side-by-side comparison of the standard, CPU water block, and the more compact GPU water block.
58
59
60We also developed several special water block designs, adapted for specific constraints.
61
62
63
64A good example is the water block we developed for the high density IBM Power 8 CPUs. In the next picture you can see this special water block’s cover and base plates.
65
66
67The previous paragraphs described our water block technology in 2014. Since then we have come a long way. We have used new technologies like 3D printing, and made some fundamental changes to the design.
68
69In the coming weeks we will publish the next posts in our water-cooling series. These will tell the story of how our water blocks have evolved since 2015. We will focus on the current generation and give you a sneak peek of the improvements to come.
70
71Keep an eye out for our next post to learn more about OVHcloud water cooling!
72
73<|endoftext|>
74<|startoftext|>
75In our previous articles, we explained why we had to move 800,000 databases from one datacentre to another, 300 kilometres away. So here we are… My team and I did it! It was a real brain-burner, so I hope our story will help you to address more of the huge technical projects we love to play with.
76
77The rules
78
79 - To reduce latency, a database must be migrated at the same time as the website using it.
80 - Because databases are spread across all available MySQL servers, the granularity of a migration has to be the database, not the MySQL instance. In other words, we cannot migrate a whole MySQL server. We must only move a part of it.
81 - As the link between a website and its database is not necessarily referenced, a website in Gravelines should be able to contact a database in Paris (for example), and vice-versa.
82 - To contact its database, a website uses a host name, a user name and a password. We want the migration to be transparent, so nobody should have to change any of those elements to contact their new database.
83 - Databases’ platforms change between Paris and Gravelines, as shown below.
84
85To sum up, this is what we have before a web cluster migration.
86And this is what we want after the migration.
87
88
89A few more things…
90
91 - Obviously, we had to keep in mind one of the most important thing when dealing with databases: consistency. For each database, we had to define a consistency point. Before this point on the timeline, read/writes were made in Paris. After it, read/writes were made in Gravelines.
92 - We believe in transparency and reversability. These are both key parts of our SMART cloud. That’s why we wanted to give you access to this consistency point, as a dump on your OVHcloud Control Panel. For every database migrated, we chose to give you access to the dump for one month.
93 - Migrating 800K databases in about 60 nights meant we had to be very fast and scalable. Our record was on the 1st of July 2019, when we successfully migrating 13,502 databases in 1 hour, 13 minutes and 31 seconds.
94 - If you’re used to being on duty, you know that your attention and efficiency are lower during the night. Repeating the migration process about 60 times in one year would amplify that, so we wanted everything to be as automated and as simple as possible. As we will see later, to launch the database migrations, we just had to run one command on a single host:
95Now you know rules, it’s time to start the game!
96
97
98Level 1
99
100The first level is always an easy one, where you discover how the game works through a kind of tutorial! So, let’s begin with a small database migration. This is how we do it:
101
1021. At the source (Paris)
103
104 Set read-only mode. We absolutely need to avoid writes during the migration, to avoid the famous split-brain. The easiest way to do this is to put the database in read-only mode. In most cases, websites need only to read the databases, but in some cases, they need read and writes, and they will therefore be broken. That’s not a problem, because the website is currently migrated and is closed. We’ll block write access, in case the database is used by another host that isn’t affected by the nighttime migration.
105 Dump the database and put the dump somewhere. We choose to store the dumps on OVHcloud’s Public Cloud Storage (PCS), as we already use this solution to store 36 million dumps per month. Adding 800,000 dumps in one year is no problem for this awesome platform!
106
1072. At the destination (Gravelines)
108
109 Retrieve the dump and import it.
110 Create the user and permissions, with write access.
111
1123. Switch to the new database
113
114 At this point, the website is still calling the database at Paris. So the website (whether it’s hosted in Paris or Gravelines) can contact the new database, we’ll update the DNS so that the name points to the Gravelines MySQL instance rather than the Paris one.
115 The read access to the Paris database is also removed.
116 Finally, we’ll update our information system, so you can retrieve the dump from PCS via your Control Panel. This update also allows us to redirect all actions available from the Control Panel (e.g. change a password, create a dump…) to the new database at Gravelines.
117
118
119Level 2: “Decentralised state machine”
120
121To deliver the proof of concept for the migration, we first ran all those steps manually and sequentially. The natural way to automate this is to write a script to do the same thing, but faster. That’s a centralised method, but such methods experience bottlenecks sooner or later, and imply a single point of failure.
122
123To prevent this, and fulfil our scalability requirements, we have to be decentralised. We must imagine the migration of a single database as a state machine. Here is a simplified version of the state graph of a database migration, as described above:
124
125
126Using this state machine, we can perform those three big steps by different machines to parallelise the workload:
127
128 The source
129 The destination
130 The one updating the DNS
131
132These three hosts can perform their tasks in an independent and decentralised way. All they have to do is watch the state graph to see if they have something to do, and if so, update it and perform the tasks.
133
134
135The brain of the migration: CloudDB
136
137We love the ‘eat your own food’ concept! That’s the best quality control ever, and with the feedback you give us, our first source of feature requests. So it’s no surprise that we used our own CloudDB product to store the state graphs of the databases migrations.
138
139Technically, a state graph is a row in a table. The simplified structure of this table looks like this:
140
141
142Except for the dump_url, all the fields are filled before the migration begins. In other words, we know where the databases are, and where they will be.
143
144We’ve beaten all the challenges of this level. Now it’s time to beat the final monster!
145
146
147Level 3: Migrate 800K databases
148
149Now that we know how to migrate a single database in a decentralised way, let’s fill the CloudDB with all the databases we want to migrate! This is what the migration now looks like:
150In Paris
151
152About once a minute*, each host of the 780 database servers asks the CloudDB if they have something to dump. The source and status columns of the table are used to retrieve this information:
153
154
155If so, they execute their tasks and update the CloudDB about what they’re doing. When they’re done, they pass the baton for this migration to Gravelines:
156
157
158
159
160In Gravelines
161
162At the same time, 300 kilometres away, hundreds of database servers are also asking the CloudDB if they have something to import. As in Paris, they query the CloudDB about once a minute*. The destination and status columns of the table are used to retrieve this information:
163
164
165If so, they execute their tasks and update the CloudDB about what they’re doing. When they’re done, they pass the baton to a third robot, who will change the DNS records for this database migration:
166
167
168(*) To avoid flooding the CloudDB, we use a random frequency to query the database with the state graphs. This way, the connections are globally distributed over time.
169
170
171
172DNS update
173
174The robot responsible for the DNS update is the third player in the migration process, and works in the same way as the dump and import robots described above.
175Not that easy…
176
177Of course, the actual game was more complex. This was a simplified version of the migration, with some steps missing or not detailed enough, such as:
178
179 Preventing writes to the source database
180 Updating the IS (among others) so you can see the dump in the Control Panel
181 Setting the password on the destination (the same as the one on the source) without knowing it
182 And many others
183
184But now that you’ve got the concept for the main steps, you can imagine how we handled the others.
185Cheat code: Iterate!
186
187Do you know the law of truly large numbers? It states that with a large enough number of samples, any unlikely thing is likely to be observed.
188
189This is one of the first lessons you learn when you host 1.2 million databases. Every day, we face the many improbable things that can happen with databases, so we knew that, despite the tests we conducted, we would encounter difficulties, weird cases, and improbable bottlenecks.
190
191But there’s a cheat code to beat that boss: iterate!
192
193 Begin the migration
194 Face a problem
195 Fix it definitively (not just for the particular case that failed, but also for all similar cases across the whole platform)
196 Then try again, faster!
197
198This method is possible thanks to two things:
199
200 The magic command
201 The big red button
202
203The magic command
204
205As mentioned above, to launch the database migrations, we had to run one command on a single host:
206
207
208This magic command has one parameter: the number of parallel migrations you want to do. We used 400 for this parameter.
209
210
211This means that 400 databases are being dumped or imported at the same time – no more, no less.
212
213The migrate-p19 command is a scheduler. It updates the CloudDB every 10 seconds, so we always have those 400 migrations performed in parallel:
214
215
216Pause the game: The big red button
217
218In every machine, it is mandatory to have a big red button to press when something’s wrong. To interrupt the migration, for a reason or another, we just have to kill the migration-p19 script. When we do this, the ongoing migrations terminate themselves, after which no new ones are launched.
219
220The interruption is automatic. If something’s really bad, the scheduler can see that there are too many operations in an error state, and then decide to stop the migration.
221
222Our database migration adventure will continue on this blog. Indeed, the world is not perfect, and we’ve had to handle some interesting cases…
223
224To be continued…
225<|endoftext|>
226<|startoftext|>
227The OVHcloud strategy to promote and protect innovation
228
229During the OVHcloud Summit last October, we announced the recent filing of 50 patent families.
230
231These patent applications obviously concern our “hardware” innovations (you know that we manufacture our servers, racks, cooling systems…) but also software patents, because contrary to popular belief it is possible to patent certain software (under certain conditions, but this is not the subject of this article).
232
233So, obviously, you are wondering why OVHcloud decided to launch this patent program when open source has been in OVHcloud’s DNA since its creation.
234
235That’s a very good question indeed!
236
237The purpose of this article is to explain why a company like ours cannot avoid filing patents and why patents and open innovation are not incompatible
238Why patent filing is essential
239
240There are multiple studies listing the reasons why companies decide to file patents (anti-aggression tool, communication tool, talent recruitment tool, financial valuation tool, competition blocking tool…).
241“PME, pensez PI” – DGCIS – Pierre Breesé and Yann de Kermadec
242
243Patenting motives, technology strategies, and open innovation
244Marcus Holgersson and Ove Granstrand
245
246From all these reasons, the mains ones are:
247Protecting ourselves
248
249There are 2 major threats to a company like ours:
250
251 The giants who can afford to file hundreds of patents a year to attack any company trying to compete with them
252 Patent Trolls, which are companies whose only business is to buy patents and sue companies to force them to pay them licence fees.
253
254Patenting is relatively expensive, and it is not usually a priority for companies when they are in the launch phase.
255
256Moreover, as long as they remain small, they can hope not to attract the attention of big competitors or patent trolls.
257
258When we started to grow and decided to open a subsidiary in the US, the territory by definition of patent trolls and GAFAM with thousands of patents, we thought that crossing our fingers in the hope of not being noticed was clearly not the right strategy, so we rolled up our sleeves, drafted a patent program, an associated reward policy, trained our employees on intellectual property and we quickly started to see the benefits: nearly 50 patent filings in 18 months! And trust me, it’s only the beginning!
259Maintaining our freedom to operate
260
261Trade secret is an interesting protection and is often preferred by companies because it is much less expensive than filing patents (well it appears to be much less expensive, but an effective management of secrecy might be very expensive too…)
262
263But it should be noted that if a third party had (even in good faith) filed a patent for an invention kept secret for several years by another company, the latter would become a counterfeiter if it continued to use its invention, frustrating no?!
264Enabling us to participate in patent pools and other open-innovation communities
265
266It is well known that there is strength in numbers!
267
268It was therefore natural that companies began to join forces to innovate.
269
270Either they wish to work together (co-development, cross-licensing…) and in this case it is preferable to file patents upstream to allow for free discussion afterwards,
271
272Or they decide to join forces against attacks by patent trolls and other aggressive companies and decide to pool their patents to serve as a bargaining chip in the event of an attack by one of the group members.
273Thanking our employees
274
275Because we believe that our first value is our employees, and that it is thanks to them that we innovate every day, we have set up an attractive reward system and we celebrate them all together when their invention is patented.
276
277We have also created Innovation Awards to reward employees of certain projects that do not meet the criteria for patentability, but which are nevertheless essential to our innovation.
278When the patent promotes open innovation
279
280Many recent studies have shown that paradoxically, the patent system promotes open innovation by stimulating inter-company collaboration in research and development.
281Technology partnerships
282
283We have just seen above that this allows companies to work more peacefully on joint projects without fear of their previous know-how being stolen.
284
285Today, OVHcloud is committed to opening up more technological partnerships with other companies, universities and research laboratories.
286Software and open-source patents
287
288In software, it must be understood that patent and copyright protection do not have the same purpose.
289
290Copyright only protects the form (i. e. source code, object code, documentation, etc.) while the patent protects the process, the method, regardless of the language used.
291
292Two software programs producing strictly similar effects but with different forms do not infringe upon each other in terms of copyright.
293
294While a patent protecting the process will prohibit its reuse regardless of the form used.
295
296But why file a patent and then give access to the sources?
297
298 To prevent a third party from deciding to replicate the functionality of open-source software (in a different form) and distribute it under proprietary license.
299 To prevent a third party from filing a broad patent on a process before we have had the opportunity to distribute the application in open source.
300 To focus community efforts around a method. Indeed, since the code is open, the whole community can use it, correct it, optimize it, and thus innovation goes faster and further. As the concept remains protected by the patent, this avoids the multiplication of methods for the same purpose and the dispersion of innovative resources.
301
302The “Economy of Peace”
303
304When Tesla allowed its patents to be used without having to pay licence fees, Tesla did not give up its patents, Musk said, “Tesla will not initiate patent lawsuits against anyone who, in good faith, wants to use our technology”.
305
306Tesla considered at that time that there was more to be gained (for the sake of saving the planet) by having the community work on their technology than by keeping it to themselves, but patents still exist and if a company does not act in good faith (probably Tesla is targeting patent trolls), then the company reserves the right to attack it.
307
308This new way of thinking and acting is in line with what the author Thierry Couzet calls the “economy of peace” which he opposes to the “economy of predation”.
309
310This is also what we at OVHcloud think and that’s why we advocate a SMART Cloud – a reversible, open and interoperable cloud through open innovation.
311
312Don’t worry, OVHcloud has not forgotten its values and intends to participate as much as possible in open innovation and promote this “economy of peace”.
313<|endoftext|>
314<|startoftext|>
315In our tour of OVHcloud’s Internal Databases Infrastructure, we showed you what our infrastructure looks like. This post will be about SQL replication, why it matters, and how we use it at OVHcloud.
316
317Nodes are not enough: they have to work together. To do that, we rely on a feature called replication to keep data in sync across all cluster nodes. With the replication feature enabled, all changes made on the primary node are propagated and applied to the rest of the cluster, so that the same data is stored on every node. When working with replication, there are certain trade-offs you have to be aware of. As there are several types of replication, we should first have a look at how replication works.
318Asynchronous replication
319
320The first one is called asynchronous replication. With this replication type, write queries are acknowledged as soon as the primary has executed them and stored the result on disk. The application can continue its work as soon as it receives this acknowledgment. In parallel, the changes are being propagated and applied to the rest of the cluster, which means that those changes won’t be visible on the replicas until they are fully propagated and applied. This also means that if you lose the primary before changes have been propagated, not-yet-propagated data will be lost.
321Semi Synchronous replication
322
323The second one is called semi-synchronous replication. With this one, write queries are acknowledged as soon as the changes have been propagated to replicas. Propagated, but not necessarily applied: changes might not yet be visible on the replicas. But if the primary node is lost, replicas have all the data they need to apply the changes.
324Synchronous replication
325
326The last one is synchronous replication. The name is self-explanatory, but let us walk through how it works. In this mode, write queries are only acknowledged when changes have been propagated AND applied to the replicas. Obviously, this is the most secure way to replicate data: no data or progress is lost if there is a primary node failure.
327A real world example
328
329But a real example is worth a thousand technical explanations: Imagine you are running a website selling hand-made items. You have two customers and an item with a single piece remaining. Now imagine that the first customer is buying this last piece, and that the second customer checks availability at the exact same time that the first one is completing his purchase.
330
331Buying an item is a write query as you need to update the stock, so you must perform it on the primary node. Displaying the webpage is a read query, so you decide to perform it on a replica node. What happens for the second customer if he/she displays the webpage at the same exact moment that the first customer receives the purchase confirmation? Does he see the item as in or out of stock.
332
333As you will probably have guessed by now, it depends on the type of replication you have chosen for your database. With an asynchronous or semi-synchronous replication, the customer would see the item as still available. With a synchronous, she would see the item as out of stock.
334Database Replication example use case
335
336This can be confusing: this is always the case. The real difference between the modes here is that in the synchronous mode, the first customer’s purchase will take longer to complete (not complete before all replicas have applied the change). The logical difference is not between the time it takes for the change to be applied, it’s between when the first client sees her purchase completed.
337
338Before going all in for synchronous replication you must take into account a crucial point: the latency. Indeed, waiting for all replicas to receive and apply the changes takes time. Latency impacts your website or application reactivity, and thus potentially your revenue. Indeed, multiple studies show that having a higher latency on purchase operations directly translates to fewer complete purchases, which you probably don’t want.
339
340You might now get where I’m going with this: asynchronous replication operations take less time to complete and thus make your applications more reactive, but enable undesirable consequences like items appearing in stock when they are not.
341
342As you can see, you choose either throughput and reactivity, or security. There is no right answer, it mostly depends on your use case. Fortunately some Database Management Systems (DBMS), such as PostgreSQL, allow you to define the level of security you want for a given query. This means you can use synchronous replication when a customer makes a purchase of at least $1000 and use asynchronous replication otherwise.
343And which method does OVHcloud use?
344
345At OVHcloud, we manage several mission critical databases, from banking transactions, to DNS parameters for all our domains names, or for our Public and Private Cloud Control Panels, information pushed by our APIs, and so on. We opted for asynchronous replication for all our databases. We cope with asynchronous disadvantages by reducing the latency as much as possible, making it negligible. Moreover our developers are experienced and thus familiar with this trade-off and are therefore able to make the best design decisions corresponding to the application they are building. So, folks, be aware of this trade-off, and think about what you really want from your application before you configure your DBMS.
346<|endoftext|>
347<|startoftext|>
348
349OVHcloud’s internal databases infrastructure
350By Wilfried Roset / 2019-10-30 / Databases, Infrastructure
351
352Today, most applications rely directly or indirectly on databases. I would even take a bet and say that a large portion of those are relational databases. At OVHcloud, we rely on several dozens of clusters hosting hundreds of databases to power thousands of applications. Most of those databases power our API, host billing information and customer details.
353Internal database architecture
354
355As part of the team responsible for this infrastructure, I can say it is a huge responsibility to maintain such a critical part of the beating heart of OVHcloud.
356
357In this new series of blog posts we will take a closer look at OVHcloud internal relational databases infrastructure. This first post is about the infrastructure of the internal databases. At OVHcloud, we use 3 majors DBMS (database management systems), PostgreSQL MariaDB and MySQL, every one of them relying on the same cluster architecture.
358
359But first, what exactly is a cluster? A cluster is a group of nodes (physical or virtual) working together to provide a SQL service.
360
361At OVHcloud we have an open source and “do it yourself” culture. It allow us to control our costs and more importantly to master the technologies we rely on.
362
363That’s why during the last 2 years we designed, deployed, improved and ran failure-proof cluster topologies, then industrialised them. To satisfy our reliability, performance and functional requirements, we decided on a common topology for all these clusters. Let’s find out what it looks like!
364
365Each cluster is composed of 3 nodes, with each node its role – primary, replica and backup.
366
367The primary node assumes read-write workloads, while the replica(s) only handle read-only queries. When the primary node fails, we promote a replica node to become the primary node. Because in the vast majority of cases, databases handle much more read-only than read-write queries, replica nodes can be added to scale the cluster’s read-only capabilities. This is called horizontal scaling. Our last node is dedicated to backup operations. Backups are incredibly important we will talk a bit more about them later.
368Internal database cluster architecture
369
370Because every node in the cluster can be promoted to primary, they need to be able to handle the same workload. Thus, they must have exactly the same resources (CPU, RAM, disk, network …). This is particularly important when we need to promote a replica because it will have to handle the same workload. In this case, having primary and replica not equally sized can be disastrous for your workload. With our clusters up and running we can start querying them. Each cluster can host one or more databases depending on several factor such as infrastructure cost and workload types (business critical or not, transactional or analytic…).
371
372Thus, a single cluster can host from only one big database to tens of smaller ones. In this context, small and big are not only defined by the quantity of data but also by the expected frequency of queries. For this reason, we carefully tailor each cluster to provision them accordingly. When a database grows and the cluster is no longer appropriately sized, we migrate the database to a new cluster.
373Internal database monitoring
374
375Aside from production we have another smaller environment that fulfils two needs. This is our development environment. We use it to test our backups and to provide our developers with a testing environment. We will get back to this matter in just a few lines.
376
377Now let us talk about backups. As I mentioned earlier, backups are a critical part of enterprise-grade databases. To avoid having to maintain different processes for different DBMS flavors, we designed a generic backup process that we apply to all of those.
378
379This allowed us to automate it more efficiently and abstract the complexity behind different software.
380
381As you have probably guessed by now, backups are performed by the backup node. This node is part of the cluster and data is synchronously replicated on it, but it does not receive any query. When a snapshot is performed, the DBMS process is stopped and a snapshot of the filesystem is taken and sent to a storage server outside of the cluster for archival and resiliency. We use ZFS for this purpose because of its robustness and because of the incremental bandwidth which reduces the storage costs associated with snapshot archival.
382
383But the main reason for having a separate backup node is the following: the cluster is not affected in any way by the backup. Indeed, backing up a full database can have a very visible impact on production (locks, CPU and RAM consumption etc…), and we don’t want that on production nodes.
384
385But backups are useless if they can’t be restored. Therefore, every day, we restore the last snapshot of each cluster on a separate, dedicated node. This allows us to kill two birds with one stone, as this freshly restored backup is also used by our developers team to have an almost up-to-date development environment in addition to making sure we are able to restore backups.
386Internal database cluster backup and Dev use
387
388To summarise: our database clusters are modular but follow a common topology. Clusters can host a variable number of databases depending on their expected workloads. Each of these databases scale horizontally for read-only operations by proposing different connections for read-only and read-writes operations. Furthermore, backup nodes are used to having regular backups without impacting the production databases. Internally, these backups are then restored on separate nodes as a fresh development environment.
389
390This completes our tour of OVHcloud’s Internal Database Infrastructure and you are now all set for the next post which will be about replication. Stay tuned!
391<|endoftext|>
392<|startoftext|>
393IOT: Pushing data to OVHcloud metrics timeseries from Arduino
394
395Last spring, I built a wood oven in my garden. I’ve wanted to have one for years, and I finally decided to make it. To use it, I make a big fire inside for two hours, remove all the embers, and then it’s ready for cooking. The oven accumulates the heat during the fire and then releases it.
396
397Once the embers are removed, I have to prioritise the dishes I want to cook as the temperature drops:
398
399 Pizza: 280°C
400 Bread: 250°C
401 Rice pudding: 180°C
402 Meringues: 100°C
403
404I built a first version of a thermometer with an Arduino, to be able to check the temperature. This thermometer, made of a thermocouple (i.e. a sensor that measures high temperatures), displays the inside temperature on a little LCD screen.
405
406The next step was to anticipate when to stuff dishes into the oven. Watching the temperature dropping down for hours was not a good idea. I needed the heat diagram of my oven! A heat diagram is just the chart of the temperature over a given period of time. But writing down temperature on a paper every ten minutes… wait… it will last more than 30 hours.
407
408Please, let me sleep !
409
410This needs some automation. Fortunately, OVHcloud has the solution: Metrics Data Platform: https://www.ovh.com/fr/data-platforms/metrics/
411The Hardware
412
413The aim of the project is to plug a sensor onto an Arduino that will send data to OVHcloud Metrics Data Platform (https://www.ovh.com/fr/data-platforms/metrics/) via the network. Basically, the Arduino will use the local wifi network to push temperature data to OVHcloud servers.
414
415Do you know ESP8266? It’s a low-cost (less than 2€!) wifi microchip with full TCP/IP stack and microcontroller capability.
416ESP8266 functional diagram
417Implementation: Wemos
418
419ESP8266 is not quite so easy to use on its own:
420
421 Must be powered at 3.3V (not too much, or it will burn)
422 No USB
423
424That’s why it is better to use a solution that implements ESP8266 for us. Here is the Wemos!
425
426 Powered at 5V (6V is still ok)
427 USB for serial communication (for debugging)
428 Can be programmed via USB
429 Can be programmed with Arduino IDE
430 Costs less than 3€
431
432Prepare your Arduino IDE
433Install the integrated development environment
434
435First of all you need to install Arduino IDE. It’s free, and available for any platform (Mac, Windows, Linux). Go to https://www.arduino.cc/en/main/software and download the version corresponding to your platform. At the time of writing, the current version is 1.8.10.
436Additional configuration for ESP8266
437
438When you install the Arduino IDE, it will only be capable of programming official Arduinos. Let’s add the firmware and libraries for ESP8266…
439
440Start Arduino and open the “Preferences” window (File > Preferences).
441
442Enter https://arduino.esp8266.com/stable/package_esp8266com_index.json into the “Additional Board Manager URLs” field. You can add multiple URLs, separating them with commas.
443
444Now open “Boards Manager” from the Tools > Board menu and install the esp8266 platform (don’t forget to select your ESP8266 board from the Tools > Board menu after installation).
445
446You are now ready!
447Order a Metrics Data Platform
448
449Go to the OVHcloud Metrics Data Platform website: https://www.ovh.com/fr/data-platforms/metrics/. Click on the free trial, and finalise your order. If you don’t have an account, just create one. With this trial you will have 12 metrics (i.e. 12 sets of records). In this example, we will only use one.
450Retrieve your token
451
452Go to the OVH Control Panel: https://www.ovh.com/manager/cloud/#/. On the left-hand panel, you should have Metrics and a new service inside.
453
454In the “Tokens” tab, you can copy the write token. Keep it, as we will need it later.
455
456Note that to configure Grafana, you will need the read token.
457Retrieve the host of the Metrics Data Platform
458
459The host of your Metrics Data Platform is given in your service description. In the “Platforms” tab, copy the opentsdb host. Keep it, as we will need it later.
460Deeper into the program
461
462Now let’s have a look at an example. Here is a code that will push static data to OVHcloud Metrics Data Platform. You can use it with your sensor. You just have to code the sensor measure. When running, the Wemos will:
463
464 Try to connect to you wifi network
465 If successful, push data to OVHcloud Metrics Data Platform
466
467The whole source code is available on my github: https://github.com/landru29/ovh_metrics_wemos.
468
469There are six main files:
470
471 ovh_metrics_wemos.ino: the main file
472 wifi.cpp: class that implements the process to connect to wifi via WPS (Wifi Protected Setup)
473 wifi.h: header file for the wifi
474 metrics.cpp: class that sends the metric data to OVHcloud Metrics Data Platform via HTTPS
475 metrics.h: header file for metrics
476 config.h.sample: model to create your configuration file (see below)
477
478Create your configuration file
479
480If you try to compile the program, you will get errors, as some definitions are missing. We need to declare them in a file: config.h.
481
482 Copy config.h.sample into config.h
483 Copy the write token you got in paragraph 5.1 (#define TOKEN “xxxxxx”)
484 Copy the host you got in paragraph 5.2 (#define HOST “xxxxxx”)
485
486Get the fingerprint of the certificate
487
488As the Wemos will request through HTTPS, we need the certificate fingerprint. You will need the host you just grabbed from the “Platforms” tab and then:
489Linux users
490
491Just run this little script:
492
493HOST=opentsdb.gra1.metrics.ovh.net; echo | openssl s_client -showcerts -servername ${HOST} -connect ${HOST}:443 2>/dev/null | openssl x509 -noout -fingerprint -sha1 -inform pem | sed -e "s/.*=//g" | sed -e "s/\:/ /g"
494
495Copy the result in your config.h (#define FINGERPRINT "xx xx ..").
496MAC users
497
498Just run this little script:
499
500HOST=opentsdb.gra1.metrics.ovh.net; echo | openssl s_client -showcerts -servername ${HOST} -connect ${HOST}:443 2>/dev/null | openssl x509 -noout -fingerprint -sha1 -inform pem | sed -e "s/.*=//g" | sed -e "s/\:/ /g"
501
502Copy the result in your config.h (#define FINGERPRINT "xx xx ..").
503Windows users
504
505In your browser, go to https://opentsdb.gra1.metrics.ovh.net. Click on the lock next to the URL to display the fingerprint of the certificate. Replace all ‘:’ with one space.
506Compile the project and upload it to the Wemos
507
508 Open the .ino file in the Arduino IDE (you should have six tabs in the project)
509 Plug the Wemos into you computer
510 Select the port from Tools > Port
511 On the top-left side, click on the arrow to upload the program
512 Once uploaded, you can open the serial monitor: Tools > Serial Monitor
513
514Right now, the program should fail, as the Wemos will not be able to connect to your wifi network.
515Run the program
516
517As we’ve already seen, the first run crashes. It’s because you need to launch a WPS connection, so depending on your internet modem, you will need to launch a WPS transaction. This could be a physical button on the modem, or a software action to trigger on the console (https://en.wikipedia.org/wiki/Wi-Fi_Protected_Setup).
518
519When the process is launched on the modem side, you have something like 30 seconds to power the Wemos.
520
521 Plug in your Wemos via USB => the program is running
522 Select the port from Tools > Port (it may have changed)
523 Open the serial monitor: Tools > Serial Monitor
524
525Now you can follow the process.
526Wifi connection
527
528In the serial monitor (adjust the bit rate to 9600), you should get:
529
530Try to connect
531
532WPS config start
533Trying to connect to <your modem> with saved config ...|SUCCESS
534IP address: 192.168.xx.xx
535
536If the wifi connection was successful, the serial console should display a local IP address (192.168.xx.xx), otherwise, it failed. Try again by triggering WPS on your modem and restarting the Wemos (unplug it and plug it back in).
537Sending data to OVHcloud Metrics Data Platform
538
539Now the Wemos is POSTing a request on the OVHcloud server. The serial console shows you the JSON it will send:
540
541------------------------------------------------
542POST opentsdb.gra1.metrics.ovh.net/api/put
543[{"metric": "universe","value":42,"tags":{}}]
544------------------------------------------------
545beginResult: 0
546http: 204
547response: xxxx
548
549If beginResult is negative, connection to the OVHcloud server failed. It could mean that the FINGERPRINT is wrong.
550
551If http is not 2xx (it should be 204), the server could not process your request. It may mean that the TOKEN is wrong.
552
553You got a 204? Great! It’s a success. Let’s check that on Grafana…
554Configure Grafana
555
556Go to OVHcloud Grafana: https://grafana.metrics.ovh.net/login. Log in with your OVHcloud account.
557Configure a data source
558
559Click on “Add data source”.
560
561 Name: choose one
562 Type: OpenTSDB
563 URL: https://<host you got from your manager (see below)>
564 Access: direct
565 Check “Basic Auth”
566 User: metrics
567 Password: <Read token from your manager (see below)>
568
569Click on the “Add” button…
570
571… and save it.
572Create your first chart
573
574Go back to https://grafana.metrics.ovh.net/ and click on “New Dashboard”.
575
576Click on “Graph”.
577
578Click on “Panel title”, then “Edit”.
579
580Select your metric in the “metric name” field. The software must suggest the name universe (the name specified in the Arduino program). If it doesn’t, this means the metrics were not correctly sent by the Wemos. Close the “edit” panel (click the cross on the right) and save your configuration (top-left of the window).
581Result analysis
582Temperature rise
583
584The first result to analyse is the temperature rise. The sensor was lying on the bricks of the oven. The yellow chart is the oven temperature, and the green chart is the ambient temperature.
585
586 Between 11:05 and 11:10, there is a step at about 85°C. It seems to be the moisture of the oven that was drying.
587 Then there’s a temperature drop, so I added some more wood to the oven (i.e. introduced cold stuff).
588 At about 11:20, the slope is lighter, and I have no idea why. Fire not strong enough? Moisture deeper in the bricks?
589
590Temperature dropdown
591
592At this point, I moved all the embers at the back of the oven and put the sensor where the fire was burning. That’s why the chart begins at 400°C.
593
594 The temperature dropdown seems to be something like F(t) = A/t
595 At about 15:40, I changed the power supply from a phone power supply plugged in at 230V to a car battery with a voltage regulator (which seemed to be shitty)
596 The ambient temperature is quite high between 15:00 and 17:00. It was a sunny day, so the sun was directly heating the circuit.
597<|endoftext|>
598<|startoftext|>
599Dealing with small files with OpenStack Swift (part 1)
600
601OpenStack Swift is a distributed storage system that is easy to scale horizontally, using standard servers and disks.
602
603We are using it at OVHcloud for internal needs, and as a service for our customers.
604Swift at OVHcloud
605
606By design, it is rather easy to use, but you still need to think about your workload when designing a Swift cluster. In this post I’ll explain how data is stored in a Swift cluster, and why small objects are a concern.
607How does Swift store files?
608
609The nodes responsible for storing data in a Swift cluster are the “object servers”. To select the object servers that will hold a specific object, Swift relies on consistent hashing:
610
611In practice, when an object is uploaded, a MD5 checksum will be computed, based on the object name. A number of bits will be extracted from that checksum, which will give us the “partition” number.
612
613The partition number enables you to look at the “ring”, to see which server and disk should store that particular object. The “ring” is a mapping between a partition number, and the object servers that should store objects belonging to that partition.
614
615Let’s take a look at an example. In this case we will use only 2 bits off the md5 checksum (far too low but much easier to draw! There are only 4 partitions)
616
617When a file is uploaded, from its name and other elements, we get a md5 checksum, 72acded3acd45e4c8b6ed680854b8ab1. If we take the 2 most significant bits, we get partition 1.
618
619From the object ring, we get the list of servers that should store copies of the object.
620
621With a recommended Swift setup, you would store three identical copies of the object. For a single upload, we create three actual files, on three different servers.
622Swift policies
623
624We’ve just seen how the most common Swift policy is to store identical copies of an object.
625
626That may be a little costly for some use cases, and Swift also supports “erasure coding” policies.
627
628Let’s compare them now.
629
630The “replica policy” which we just described. You can choose how many copies of objects you want to store.
631Replica in Swift
632
633The “erasure coding” policy type
634Erasure coding in Swift
635
636The object is split into fragments, with added redundant pieces to enable object reconstruction, if a disk containing a fragment fails.
637
638At OVHcloud, we use a 12+3 policies (12 pieces from the object and 3 computed pieces)
639
640This mode is more space efficient than replication, but it also creates more files on the infrastructure. In our configuration, we would create 15 files on the infrastructure, vs 3 files with a standard “replication” policy.
641Why is this a problem?
642
643On clusters where we have a combination of both an erasure coding policy, and a median object size of 32k, we would end up with over 70 million files *per drive*.
644
645On a server with 36 disks, that’s 2.5 billion files.
646
647The Swift cluster needs to regularly list these files to:
648
649 Serve the object to customers
650 Detect bit rot
651 Reconstruct an object if a fragment has been lost because of a disk failure
652
653Usually, listing files on a hard drive is pretty quick, thank’s to Linux’s directory cache. However, on some clusters we noticed the time to list files was increasing, and a lot of the hard drive’s IO capacity was used to read directory contents: there were too many files, and the system was unable to cache the directory contents. Wasting a lot of IO for this meant that the cluster response time was getting slower, and reconstruction tasks (rebuilding fragments lost because of disk failures) were lagging.
654
655In the next post we’ll see how we addressed this.
656<|endoftext|>
657<|startoftext|>
658Web hosting – How do our databases work?
659
660In our series of articles about migrating web hosting infrastructure from Paris to Gravelines (the first of which is here), we neglected to mention the databases. This is however a very important part, with 800,000 databases to be migrated!
661
662In these new series of posts, we will tell you about our experience with web hosting databases.
663How to handle 800k databases?
664
665Our previous posts explained why we have to migrate databases along with their websites. You also know that we set up a network bridge for customers using non-referenced databases.
666
667Yet migrating databases also brings challenges. We will describe the most important ones. But first, a little reminder of our different architectures.
668
669OVH offers two types of databases:
670
671 Shared databases (which we call SharedSQL)
672
673 Private databases (which we call, you guessed it, PrivateSQL)
674
675What is a SharedSQL?
676
677As the name suggests, it is a database on a shared server, which contains many other bases, and which shares its hardware resources, including RAM.
678
679Of course, not everyone has access to everyone’s data! Each database is defined with a single user able to access it.
680
681This technique allows us to exploit up to 2500 databases on a single server. This is much cheaper than offering a physical server per customer. This is how we can offer low cost access to several databases within our hosting offers.
682What is a PrivateSQL?
683
684Unlike SharedSQL, PrivateSQL are databases with guaranteed resources, including RAM. In PrivateSQL, users also have many more rights and greater customisation capabilities.
685
686Does each user have their own server? Not really! Several years ago we used Docker technology to containerise our databases, we already discussed it in this post: https://www.ovh.com/en/blog/docker-administration-databases-a-flying-ideas/. With a PrivateSQL, it’s not just the database space that is private, the RAM assigned to the service is also guaranteed. This means that in no matter the circumstance, the performance is consistent.
687Seven differences!
688
689When considering migration, we had to study the difference in architectures between Paris and Gravelines.
690
691Regarding PrivateSQL, it was simple: there is no difference, we had already aligned the architectures when we containerised them. Migrating them would be a piece of cake: we just had to move a container from point A to point B.
692
693At least on paper, in real life it was not so simple, but we will talk about it next time.
694
695Here is a simplified diagram of a PrivateSQL
696PrivateSQL
697
698In contrast, SharedSQL databases were not really aligned. When we set up the Gravelines datacentre in 2016, we took advantage of Docker’s flexibility to review our database technology, and therefore adapt our former solution.
699
700Small comparative schema of a SharedSQL in Paris and Gravelines:
701
702SharedSQL at Paris P19 and Gravelines:
703SharedSQL
704
705On the one hand (in Paris), we have servers with a single database management system (MySQL) hosting 2500 databases on one machine.
706
707On the other (in Gravelines), we have added a level: Docker containers have the same database system (MySQL) hosting up to 250 databases. On each machine, we host 10 containers.
708
709This new system greatly simplifies maintenance, reducing the impact on database performance as it occurs.
710
711Here is what it provides:
712 Paris Gravelines Explanation
713Number of clients impacted by an incident on an instance of MySQL 2500 250 When MySQL is on a container, the impact is lower than when it is directly on the server.
714Total number of customers per server Equivalent. In Paris, the 2500 databases operate on the same MySQL process. In Gravelines, they share 10 processes, each in their container. But the number per physical machine remains the same.
715Easy server migration in the event of a hardware failure Docker containers provide us with ease of migration and disaster recovery between two machines. Thus, we can easily and quickly move a container from one server to another without data loss.
716Startup Speed Starting a container is about the same as starting a process.
717Version upgrade The transition from one version to another is simplified with the Docker image system. We just need to change the image of the container to the desired version, and restart the container.
718Rollback in case of problem during an update We just have to restart the container on its previous image.
719Robust and long-lasting implementation Running databases under Docker is not simple (see https://www.ovh.com/blog/docker-administrating-databases-a- crazy-idea /), but we have a few years’ experience.
720
721Of course, running a database in a Docker container seemed like a crazy idea. But after many years of production, this technology simplifies everyday life in the Gravelines datacentre.
722
723So when we had to choose between replicating the Paris infrastructure, or changing infrastructure during migration, we chose to change technology.
724
725And all of a sudden we have another migration challenge.
726
727In our next article, we will detail how to move a huge database in a few minutes, while changing a large part of the software and the infrastructure behind it!
728<|endoftext|>
729<|startoftext|>
730The ins and outs of IPMI
731
732What is IPMI? What’s the purpose of IPMI? Why should I care about IPMI? These are all fair questions. In the hosting provider world, IPMI or (Intelligent Platform Management Interface) is thrown around almost as much as “SDDC (Software Defined Data Center)” or “IaaS (Infrastructure as a Service)” but what does it mean, and why should you care?
733
734IPMI was created in a cooperative partnership between Intel, Dell, Hewlett Packard, and NEC. Since its creation, it has become an industry standard as an important hardware solution that allows Server Admins to monitor hardware status, log server data, and allow access to the server without having physical access of the server. By accessing a server through IPMI you are granted access to the system’s BIOS, having this access allows you to install or reinstall your own operating system, fix any network misconfigurations, or re enable SSH or RDP access using KVM (Keyboard Video Mouse) access to a server.
735
736By utilizing OVHcloud® infrastructure you will be able to use IPMI and have access to your server’s BIOS. This enables you to be an effective server administrator and troubleshoot any issues you may have with your server as well as install any operating system compatible with your server’s components.
737
738At OVHcloud it’s important to us that our customers have the freedom and flexibility to innovate solutions to any challenge or problem they see before them; utilizing IPMI is one way we can give our customers such freedom.
739
740To learn more about how to access IPMI from your OVHcloud Manager and how to install an operating system utilizing IPMI please consult the following guides that take you step by step through the process: Getting Started with IPMI, How to Install an OS with IPMI.
741<|endoftext|>
742<|startoftext|>
743Simplify your research experiments with Kubernetes
744
745Abstract
746
747As a researcher I need to conduct experiments to validate my hypotheses. When the field of Computer Science is involved, it is well known that practitioners tend to drive experiments on different environments (at the hardware level: x86/arm/…, CPU frequency, available memory, or at the software level: operating system, versions of libraries). The problem with these different environments is the difficulty of accurately reproducing an experiment as it has been presented in a research article.
748
749In this post we provide a way of conducting experiments that can be reproduced by using Kubernetes-as-a-service, a managed platform to perform distributed computations along with other tools (Argo, MinIO) that take the advantages of the platform into consideration.
750Simplify your research experiments with Kubernetes
751
752The article is organised as follow, we first recall the context and the problem faced by a researcher who needs to conduct experiments. Then we explain how to solve the problem with Kubernetes and why we did not choose other solutions (e.g., HPC software). Finally, we give some tips on improving setup.
753Introduction
754
755When I started my PhD, I read a bunch of articles related to the field I’m working on, i.e. AutoML. From this research, I realised how important it is to conduct experiments well in order to make them credible and verifiable. I started asking my colleagues how they carried out their experiments, and there was a common pattern: develop your solution, look at other(s) solution(s) that are related to the same problem, run each solution 30 times if it is stochastic with equivalent resources and compare your results to the other(s) solution(s) with statistical tests: Wilcoxon-Mann-Whitney when comparing two algorithms, or else Friedman test. As it is not the main topic of this article, I will not discuss statistical tests in detail.
756
757As an experienced DevOps, I had one question about automation: How do I find out how to reproduce an experiment, especially of another solution? Guess the answer? Meticulously read a paper, or find a repository with all the information.
758
759Either you are lucky and a source code is available, or else a pseudo-code is provided in the publication. In this case you need to re-implement the solution to be able to test it and compare it. Even if you are lucky and there is a source code available, often the whole environment is missing (e.g., exact version of the packages, python version itself, JDK version, etc…). Not having the right information impacts performance and may potentially bias experiments. For example, new versions of packages, languages, and so on, usually have better optimisations that your implementation can use. Sometimes it is hard to find the versions that have been used by practitioners.
760
761The other problem, is the complexity of setting up a cluster with HPC software (e.g., Slurm, Torque). Indeed, it requires technical knowledge to manage such a solution: configuration of the network, verifying that each node has the dependencies required by the runs installed, checking that nodes have the same versions of libraries, etc… These technical steps consume time for researchers, thus take them away from their main job. Moreover, to extract the results, researchers usually do it manually, they retrieve the different files (through FTP or NFS), and then perform statistical tests that they save by hand. Consequently, the workflow to perform an experiment is relatively costly and precarious.
762
763In my point of view, it raise one big problem: that an experiment can not really be reproduced in the field of Computer Science.
764Solution
765
766OVH offers Kubernetes-as-a-service, a managed cluster platform where you do not have to worry about how to configure the cluster (add node, configure network, and so on), so I started to investigate how I could perform experiments similarly to the HPC solutions. Argo Workflows, came out of the box. This tool allows you to define a workflow of steps that you can perform on your Kubernetes cluster within each step is confined in a container, loosely called image. A container allows you to run a program under a specific environment software (language version, libraries, third-parties), additionally to limiting the resources (CPU time, memory) used by the program.
767
768The solution is linked to our big problem: make sure you can reproduce an experiment that is equivalent to run a workflow composed of steps under a specific environment.
769Simplify your research experiments with Kubernetes: architecture
770Use case: Evaluate an AutoML solution
771
772The use case that we use in our research will be related to measuring the convergence of a Bayesian Optimization (SMAC) on the problem of the AutoML
773
774For this use case, we stated the Argo workflow in the following yaml file
775Set up the infrastructure
776
777First we will to setup a Kubernetes cluster, secondly we will install the services on our cluster and lastly we will run an experiment.
778Kubernetes cluster
779
780Installing a Kubernetes cluster with OVH is child’s play. Connect to the OVH Control Panel, go to Public Cloud > Managed Kubernetes Service, then Create a Kubernetes cluster and follow the steps depending on your needs.
781
782Once the cluster is created:
783
784 Take into consideration the change upgrade policy. If you are a researcher, and your experiment takes some time to run, you want to avoid an update that would shutdown your infrastructure with your runs. To avoid this situation, it is better to choose “Minimum unavailability” or “Do not update”.
785 Download the kubeconfig file, it will serve later with kubectl to connect on our cluster.
786 Add at least one node on your cluster.
787
788Once installed, you will need kubectl, a tool that allows you to manage your cluster.
789
790If everything has been properly set up, you should get something like this:
791
792kubectl top nodes
793NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
794node01 64m 3% 594Mi 11%
795
796Installation of Argo
797
798As we mentioned before, Argo allows us to run a workflow composed of steps. To install the client and the service on the cluster, we were inspired by this tutorial.
799
800First we download and install Argo (client):
801
802curl -sSL -o /usr/local/bin/argo https://github.com/argoproj/argo/releases/download/v2.3.0/argo-linux-amd64
803chmod +x /usr/local/bin/argo
804
805Then the controller and UI on our cluster:
806
807kubectl create ns argo
808kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo/v2.3.0/manifests/install.yaml
809
810Configure the service account:
811
812kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default
813
814Then, with the client try a simple hello-world workflow to confirm the stack is working (Status: Succeeded):
815
816argo submit --watch https://raw.githubusercontent.com/argoproj/argo/master/examples/hello-world.yaml
817Name: hello-world-2lx9d
818Namespace: default
819ServiceAccount: default
820Status: Succeeded
821Created: Tue Aug 13 16:51:32 +0200 (24 seconds ago)
822Started: Tue Aug 13 16:51:32 +0200 (24 seconds ago)
823Finished: Tue Aug 13 16:51:56 +0200 (now)
824Duration: 24 seconds
825
826STEP PODNAME DURATION MESSAGE
827 ✔ hello-world-2lx9d hello-world-2lx9d 23s
828
829You can also access the UI dashboard through http://localhost:8001:
830
831kubectl port-forward -n argo service/argo-ui 8001:80
832
833Configure an Artifact repository (MinIO)
834
835Artifact is a term used by Argo, it represents an archive containing files returned by a step. In our case we will use this feature to return final results, and to share intermediate results between steps.
836
837In order to get Artifact working, we need an object storage. If you already have one you can pass the installation part but still need to configure it.
838
839As specified in the tutorial, we used MinIO, here is the manifest to install it (minio-argo-artifact.install.yml):
840
841apiVersion: v1
842kind: PersistentVolumeClaim
843metadata:
844 # This name uniquely identifies the PVC. Will be used in deployment below.
845 name: minio-pv-claim
846 labels:
847 app: minio-storage-claim
848spec:
849 # Read more about access modes here: https://kubernetes.io/docs/user-guide/persistent-volumes/#access-modes
850 accessModes:
851 - ReadWriteOnce
852 resources:
853 # This is the request for storage. Should be available in the cluster.
854 requests:
855 storage: 10
856 # Uncomment and add storageClass specific to your requirements below. Read more https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1
857 #storageClassName:
858---
859apiVersion: extensions/v1beta1
860kind: Deployment
861metadata:
862 # This name uniquely identifies the Deployment
863 name: minio-deployment
864spec:
865 strategy:
866 type: Recreate
867 template:
868 metadata:
869 labels:
870 # Label is used as selector in the service.
871 app: minio
872 spec:
873 # Refer to the PVC created earlier
874 volumes:
875 - name: storage
876 persistentVolumeClaim:
877 # Name of the PVC created earlier
878 claimName: minio-pv-claim
879 containers:
880 - name: minio
881 # Pulls the default MinIO image from Docker Hub
882 image: minio/minio
883 args:
884 - server
885 - /storage
886 env:
887 # MinIO access key and secret key
888 - name: MINIO_ACCESS_KEY
889 value: "TemporaryAccessKey"
890 - name: MINIO_SECRET_KEY
891 value: "TemporarySecretKey"
892 ports:
893 - containerPort: 9000
894 # Mount the volume into the pod
895 volumeMounts:
896 - name: storage # must match the volume name, above
897 mountPath: "/storage"
898---
899apiVersion: v1
900kind: Service
901metadata:
902 name: minio-service
903spec:
904 ports:
905 - port: 9000
906 targetPort: 9000
907 protocol: TCP
908 selector:
909 app: minio
910
911Note: Please edit the following key/values:
912
913 spec > resources > requests > storage > 10 correspond to 10 GB storage requested by MinIO to the cluster
914 TemporaryAccessKey
915 TemporarySecretKey
916
917kubectl create ns minio
918kubectl apply -n minio -f minio-argo-artifact.install.yml
919
920Note: alternatively, you can install MinIO with Helm.
921
922Now we need to configure Argo in order to use our object storage MinIO:
923
924kubectl edit cm -n argo workflow-controller-configmap
925...
926data:
927 config: |
928 artifactRepository:
929 s3:
930 bucket: my-bucket
931 endpoint: minio-service.minio:9000
932 insecure: true
933 # accessKeySecret and secretKeySecret are secret selectors.
934 # It references the k8s secret named 'argo-artifacts'
935 # which was created during the minio helm install. The keys,
936 # 'accesskey' and 'secretkey', inside that secret are where the
937 # actual minio credentials are stored.
938 accessKeySecret:
939 name: argo-artifacts
940 key: accesskey
941 secretKeySecret:
942 name: argo-artifacts
943 key: secretkey
944
945Add credentials:
946
947kubectl create secret generic argo-artifacts --from-literal=accesskey="TemporaryAccessKey" --from-literal=secretkey="TemporarySecretKey"
948
949Note: Use the correct credentials you specified above
950
951Create the bucket my-bucket with the rights Read and write by connecting to the interface http://localhost:9000:
952
953kubectl port-forward -n minio service/minio-service 9000
954
955Check that Argo is able to use Artifact with the object storage:
956
957argo submit --watch https://raw.githubusercontent.com/argoproj/argo/master/examples/artifact-passing.yaml
958Name: artifact-passing-qzgxj
959Namespace: default
960ServiceAccount: default
961Status: Succeeded
962Created: Wed Aug 14 15:36:03 +0200 (13 seconds ago)
963Started: Wed Aug 14 15:36:03 +0200 (13 seconds ago)
964Finished: Wed Aug 14 15:36:16 +0200 (now)
965Duration: 13 seconds
966
967STEP PODNAME DURATION MESSAGE
968 ✔ artifact-passing-qzgxj
969 ├---✔ generate-artifact artifact-passing-qzgxj-4183565942 5s
970 └---✔ consume-artifact artifact-passing-qzgxj-3706021078 7s
971
972Note: In case you are stuck with a message ContainerCreating, there is a lot of chance that Argo is not able to access MinIO, e.g., bad credentials.
973Install a private registry
974
975Now that we have a way to run a workflow, we want each step to represent a specific software environment (i.e., an image). We defined this environment in a Dockerfile.
976
977Because each step can run on different nodes in our cluster, the image needs to be stored somewhere, in the case of Docker we require a private registry.
978
979You can get a private registry in different ways:
980
981 Docker Hub
982 Gitlab.com
983 OVH – tutorial
984 Harbor: allows you to have your own registry on your Kubernetes cluster
985
986In our case we used OVH private registry.
987
988# First we clone the repository
989git clone git@gitlab.com:automl/automl-smac-vanilla.git
990cd automl-smac-vanilla
991
992# We build the image locally
993docker build -t asv-environment:latest .
994
995# We push the image to our private registry
996docker login REGISTRY_SERVER -u REGISTRY_USERNAME
997docker tag asv-environment:latest REGISTRY_IMAGE_PATH:latest
998docker push REGISTRY_IMAGE_PATH:latest
999
1000Allow our cluster to pull images from the registry:
1001
1002kubectl create secret docker-registry docker-credentials --docker-server=REGISTRY_SERVER --docker-username=REGISTRY_USERNAME --docker-password=REGISTRY_PWD
1003
1004Try our experiment on the infrastructure
1005
1006git clone git@gitlab.com:automl/automl-smac-vanilla.git
1007cd automl-smac-vanilla
1008
1009argo submit --watch misc/workflow-argo -p image=REGISTRY_IMAGE_PATH:latest -p git_ref=master -p dataset=iris
1010Name: automl-benchmark-xlbbg
1011Namespace: default
1012ServiceAccount: default
1013Status: Succeeded
1014Created: Tue Aug 20 12:25:40 +0000 (13 minutes ago)
1015Started: Tue Aug 20 12:25:40 +0000 (13 minutes ago)
1016Finished: Tue Aug 20 12:39:29 +0000 (now)
1017Duration: 13 minutes 49 seconds
1018Parameters:
1019 image: m1uuklj3.gra5.container-registry.ovh.net/automl/asv-environment:latest
1020 dataset: iris
1021 git_ref: master
1022 cutoff_time: 300
1023 number_of_evaluations: 100
1024 train_size_ratio: 0.75
1025 number_of_candidates_per_group: 10
1026
1027STEP PODNAME DURATION MESSAGE
1028 ✔ automl-benchmark-xlbbg
1029 ├---✔ pre-run automl-benchmark-xlbbg-692822110 2m
1030 ├-·-✔ run(0:42) automl-benchmark-xlbbg-1485809288 11m
1031 | └-✔ run(1:24) automl-benchmark-xlbbg-2740257143 9m
1032 ├---✔ merge automl-benchmark-xlbbg-232293281 9s
1033 └---✔ plot automl-benchmark-xlbbg-1373531915 10s
1034
1035Note:
1036
1037 Here we only have 2 parallel runs, you can have much more by adding them to the list withItems. In our case, the list correspond to the seeds.
1038 run(1:24) correspond to the run 1 with the seed 24
1039 We limit the resources per run by using requests and limits, see also Managing Compute Resources.
1040
1041Then we just retrieve the results through the MinIO web user interface http://localhost:9000 (you can also do that with the client).
1042
1043The results are located in a directory with the same name as the argo workflow name, in our example it is my-bucket > automl-benchmark-xlbbg.
1044Limitation to our solution
1045
1046The solution is not able to run the parallel steps on multiple nodes. This limitation is due to the way we are merging our results from the parallel steps to the merge step. We are using volumeClaimTemplates, i.e., we are mounting a volume, and this can’t be done between different nodes. The problem can be solved by two manners:
1047
1048 Using parallel artifacts and aggregate them, however it is an ongoing issue with Argo
1049 Directly implement in the code of your run, a way to store the result on an accessible storage (MinIO SDK for example)
1050
1051The first manner is preferred, it means you don’t have to change and customize the code for a specific storage file system.
1052Hints to improve the solution
1053
1054In case you are interested in going further with your setup, you should take a look on the following topics:
1055
1056 Controlling the access: in order to confine the users in different spaces (for security reasons, or to control the resources).
1057 Exploring Argo selector and Kubernetes selector: in case you have a cluster composed of nodes that have different hardware and that you require an experiment using a specific hardware (e.g., specific cpu, gpu).
1058 Configure a distributed MinIO: it ensures that your data are replicated on multiple nodes and stay available in case of a node fail.
1059 Monitoring your cluster.
1060
1061Conclusion
1062
1063Without needing in-depth technical knowledge, we have shown that we can easily set up a complex cluster to perform research experiments and make sure they can be reproduced.
1064Related links
1065
1066 Automating Research Workflows at BlackRock
1067 The State of HPC Containers
1068 Kubernetes Meets High-Performance Computing
1069<|endoftext|>
1070<|startoftext|>
1071Academics and OVH: an AI-centric collaboration
1072
1073At OVH, our innovation process stimulates internal collaboration. We have seen this with frugal innovation, but also externally with SMEs, schools or laboratories. An example of collaboration with academics is a particular thesis launched this year, about Artificial Intelligence, which focused on Automated Machine Learning field.
1074AI at OVHcloud
1075
1076Collaborative research at OVH through a thesis
1077
1078A CIFRE thesis means, in French, Convention Industrielle de Formation par la Recherche, literally Industrial Convention for Training through Research. The aim is to encourage research collaboration between private and public entities. For enterprise and PhD students, CIFRE is a means of training through research and acquiring strong scientific expertise. For academics, it is a means of managing a new PhD and of applying research results on an economic case. ANRT association manage the CIFRE thesis.
1079
1080OVH and ORKAD, a research team specialized in combinatorial optimization and knowledge extraction launched a collaboration through a CIFRE thesis.
1081
1082AutoML, as an example of an AI thesis
1083
1084The thesis is entitled “MO-AutoML: a multiobjective framework to automatically configure Machine Learning pipelines“. MO stands for Multi-Objective and AutoML for Automated Machine Learning.
1085
1086Machine Learning is a field of artificial intelligence used for a wide scope of applications like health prediction, shape recognition in embedded systems (e.g., autonomous car), marketing strategy selection, anomaly detection (e.g., temperature in a datacentre). Machine Learning algorithms are very efficient at exploiting data and extracting knowledge used to support decisions.
1087
1088The main problem with these algorithms, is the technical challenge involved in selecting and tuning algorithms for good performance. That’s why the field of AutoML has emerged, in order to tackle this challenge by automatically selecting and optimising the ML algorithm. Also, AutoML aims to automatically solve other problems related to the field of Machine Learning, such as data formatting, explaining the results (e.g., feature importance), industrialising models, and so on.
1089
1090Another problem with the current AutoML solutions, is that they are mainly single-objective. However, it can be very interesting to take several metrics measuring the quality of the model in addition to exogenous metrics, and let the user select the model in order to better address the basic problem.
1091
1092This thesis aims to advance the issues mentioned above, thus facilitating and improving the use of AutoML.
1093
1094OVH and AI
1095
1096Certainly, the AutoML thesis will have multiple consequences for OVH. From now, our work on Machine Learning has allowed us to launch, Prescience, in our Labs. Prescience is a distributed and scalable platform that allows the user to build, deploy and query ML models.
1097
1098As a result of strong collaboration with private partner NVIDIA, OVH provides the NVIDIA GPU Cloud (NGC) software platform as a European exclusive. The purpose of this partnership is to facilitate access to artificial intelligence by allowing users to run their processing, through NGC, on NVIDIA products hosted on the OVH infrastructure.
1099<|endoftext|>
1100<|startoftext|>
1101Introducing DepC: the OVH platform for computing QoS
1102
1103At OVH, our first mission as a cloud service provider is to provide products with high quality of service (QoS). Whether they are dedicated servers, cloud servers or hosted websites, our customers expect our solutions to be very high quality. And that’s exactly what our teams strive to offer you on a daily basis!
1104
1105It’s a difficult mission. In many cases, the quality of service may depend on our infrastructure, but also on the solution’s intended use. And identifying the origin of any degradation can require advanced diagnoses.
1106
1107So, how do we quantify this quality of service? How do we understand the quality delivered for every product, every day, as precisely as possible?
1108
1109The first step was to find existing tools, but we quickly realised that no solution met our need. Based on this observation we decided to develop our own own solution for computing QoS: DepC. Originally built for the WebHosting team, this platform quickly spread throughout OVH. It is now used internally by dozens of teams.
1110
1111We first built DepC to calculate the QoS of our infrastructure. But over the years, we’ve discovered that the tool can be used to calculate the QoS of any complex system, including both infrastructures and services.
1112
1113In order to be as transparent as possible, we also decided to explain and prove our calculation methods. That’s why we chose to make DepC open source. You can find it on Github.
1114
1115Before you dive into the code, let’s take a look how DepC works, as well as how we use it to calculate the quality of our products.
1116DepC
1117What is QoS?
1118
1119Above all, it is important to define the exact nature of what we want to calculate. QoS describes the a system’s state of health. It can be a service (e.g. customer support, waiting time at the cash desk…), the operation of a product (e.g. the lifecycle of a dishwasher), or complex systems (e.g. a website’s infrastructure).
1120
1121This state of health is very subjective and will vary for each case, but it will generally be based on the likelihood of a user benefiting from the service in good conditions. In our case, good conditions mean both service availability (i.e. the infrastructure works) and its general condition (i.e. the infrastructure responds correctly).
1122
1123QoS is expressed as a percentage, starting from 100%, when the service is perfectly achieved, then decreasing little by little in the event of failure. This percentage is to be associated with a period: month, day, time, etc. A service can therefore have a 99.995% QoS on the current day, whereas it was 100% the day before.
1124
1125Other concepts are also important:
1126
1127 SLA (Service Level Agreement): not to be confused with QoS, this is a contract between the customer and the supplier, indicating the quality of service expected. This contract can possibly include the penalties granted to the customer in the event of failure to meet the objectives.
1128 SLO (Service Level Objective): this refers to the goal that a service provider wants to achieve in terms of QoS.
1129 SLI (Service Level Indicator): this is a measure (ping response time, HTTP status code, network latency…) used to judge the quality of a service . SLIs are at the heart of DepC, since they allow us to transform raw data into QoS.
1130
1131The goals
1132
1133DepC was originally built for the WebHosting team. With 5 million web sites, spread across more than 14,000 servers, the infrastructure required to run the websites (described in this article), as well as constant changes, made it difficult to calculate the quality of service in real time for each of our customers. Furthermore, to identify a problem in the past, we also needed know how to reconstruct the QoS to reflect the state of the infrastructure at that time.
1134
1135Our ambition was to show the evolution of QoS day by day for all customers, and identify the causes of any degradation in the quality of service.
1136
1137But how could we measure the state of health of each of our customers’ websites? Our first idea was to query them one by one, analyse the answer’s HTTP code, and deduce the health of the website based on that. Unfortunately this scenario proved to be difficult to implement for several reasons:
1138
1139 The WebHosting team manages millions of websites, so scaling would have been very difficult.
1140 We are not the only guarantors of the proper functioning of websites. This also depends on the customer, who can (deliberately or not) generate errors that would be interpreted as false positives.
1141 Even if we had solved the previous difficulties and the QoS of the websites could be calculated, it would have been impossible to identify the root causes in case of failure.
1142
1143We had to find another solution…
1144Graph of dependencies
1145
1146Based on this observation, we decided to work around the problem: if it is impossible to directly calculate the QoS of our customers’ websites, we will calculate it indirectly, based on their dependencies.
1147
1148To understand this, we must keep in mind how our infrastructure works. Without going into too many details, be aware that each website works through a set of servers communicating together. As an example, here are two dependencies you inherit when you order a web hosting solution from OVH:
1149
1150 The source code of your websites is hosted on storage servers (called filerz).
1151 The databases used by the website are also hosted on database servers.
1152
1153If one of these servers suffers a failure, the availability of the website will inevitably be impacted, thereby degrading the client’s QoS.
1154The diagram shows that the malfunction of a database server automatically impacts all databases it contains, and by domino effect, all websites using these databases
1155
1156The diagram above shows that the malfunction of a database server automatically impacts all databases it contains, and by domino effect, all websites using these databases.
1157
1158This example is deliberately simplified, as our customers’ dependencies are, of course, far more numerous (web servers, mail servers, load balancers, etc.), even without considering all the security measures put in place to reduce these risks of failure.
1159
1160For those who have taken some computer courses, these dependencies are very similar to a graph. So we chose to use a graph-oriented database: Neo4j. In addition to very good performance, the query language, Cypher, and the development platform are real assets.
1161
1162However, the creation of the dependency tree (the nodes and their relations) does not require us to know Neo4j, because we have developed a daemon that allows us to transform JSON messages into nodes on the graph. DepC provides an API, so that each team can add new elements to its dependency tree without having to learn Cypher.
1163
1164The principle is:
1165
1166 DepC users send a JSON message in a datastream Kafka. This message indicates the new nodes to be created, as well as their relationship (a website node connected to anode filer for example). All nodes and relationships contain temporal information, which helps maintain infrastructure changes over time.
1167 DepC analyses these messages and then updates the dependency graph in real time.
1168
1169DepC principle
1170
1171Since DepC is available on Github, the documentation for this part is available in this guide.
1172QoS Calculation
1173
1174The DepC platform offers APIs for storing and querying a dependency tree. This may sound trivial, but keeping a view of an infrastructure over time is already a complex task. This is so powerful that some teams only use this part of the platform, using DepC as the equivalent of their CMDB (inventory of their technical park).
1175
1176But the value of DepC goes further. Most of our users calculate the quality of service of their node, but DepC offers two methods, to suit different cases:
1177
1178 The node represents an element monitored by one or more probes
1179 The targeted node is not a monitored element
1180
1181Monitored nodes
1182
1183A monitored node can be, for example, a server, a service or a piece of network equipment. Its main characteristic is that a probe sends measurements to a time-series database.
1184
1185Here we find the concept of SLI that we saw above: DepC analyses the raw data sent by the probes in order to transform them into the QoS.
1186
1187The principle is very simple:
1188
1189 Users declare indicators in DepC, defining the query to get the data from the time-series database, as well as the threshold that implies a QoS degradation for this node.
1190 DepC launches this request for all the nodes selected by the user, then each result is analysed, in order to calculate the QoS as soon as the threshold is exceeded. We then get the QoS of a given node. Note that this process is performed every night, thanks to the task scheduling tool, Airflow.
1191
1192Technically, DepC time-series analysis is simply a matter of transforming a time-sorted list of values into a time-sorted list of booleans.
1193
1194The calculation is then very simple: the “true” value will increase the QoS, while the “false” value will lower it. For example, out of a total of 100 points, with 95 points below the threshold (so true), the QoS will be 95% (DepC starts this calculation every night; the number of datapoints is actually much higher).
1195
1196Note that to complete this part, DepC currently supports the OpenTSDB and Warp10 time-series databases. Other time-series databases will be added soon (InfluxDB, Prometheus…).
1197Unmonitored Nodes
1198
1199Some nodes represent non-probe-monitored items. In such cases, their QoS will be calculated based on the QoS of their parents in the dependency tree.
1200
1201Imagine, for example, a node representing a “client”, and linked to several monitored nodes of type “server”. We have no data to analyse for this client. On the other hand, for the “server” nodes, we can calculate their QoS, thanks to the monitored nodes. We then aggregate these QoS figures to get that of the “client” node.
1202
1203To achieve this, DepC calculates the QoS of the monitored nodes, thus retrieving a list of Booleans. Then, the Boolean operation AND is applied between these different lists (by dependency) in order to obtain a unique list of Booleans. This list is then used to calculate the QoS of our unmonitored node.
1204
1205The calculation is then carried out in the same way as the monitored nodes, by considering the number of “true” occurrences in relation to the total number of points.
1206
1207For this example, we only used a Boolean operator. However, DepC provides several types of Boolean operations for different applications:
1208
1209 AND: all the dependencies must work for the service to be rendered.
1210 OR: one dependency is enough to render the service.
1211 RATIO (N): it is necessary for N% of the dependencies to work for the service to be rendered.
1212 ATLEAST (N): regardless of the number of dependencies, the service is rendered if at least N dependencies function.
1213
1214We won’t delve too deeply into the internal functioning that allows us to calculate QoS on a large scale. But if it interests you, I invite you to watch the conference we gave at the FOSDEM 2019 in the Python devroom. Video and slides are available at this address.
1215Conclusion
1216
1217DepC is already used by dozens of teams at OVH. The chosen architecture allows us to offer QoS visualisation through an onboard web interface, with DepC itself, or to deport the display in Grafana.
1218
1219The platform perfectly fulfils its initial goal of reporting: we can now visualise the quality of service we offer our customers, day after day, and also zoom in on the dependency tree to discover the root causes of any possible failure.
1220
1221Our roadmap for the next few months is very busy: always calculating more QoS, calculating the QoS of these nodes according to that of other teams, and displaying it all in a simple and understandable way for our customers…
1222
1223Our goal is to become the standard solution for QoS calculation at OVH. The tool has been in production for several months, and we receive thousands of new nodes per day. Our database currently contains more than 10 million, and this is just beginning.
1224
1225And of course, if you want to test or deploy DepC at home, do not hesitate. It is open source, and we remain at your disposal if you have questions or ideas for improvement!
1226Links
1227
1228 Github: https://github.com/ovh/depc
1229 Documentation: https://ovh.github.io/depc/
1230 Presentation FOSDEM 2019 (EN): https://fosdem.org/2019/schedule/event/python_compute_qos_of_your_infrastructure/
1231 Presentation PyconFR 2018 (FR): https://pyvideo.org/pycon-fr-2018/calculer-la-qos-de-vos-infrastructures-avec-asyncio.html
1232
1233<|endoftext|>
1234<|startoftext|>
1235
1236Web hosting: How to migrate 3 Million web sites?
1237By Vincent Cassé / 2019-07-12 / Infrastructure, Migration, Web Hosting
1238
1239In previous articles, we saw what are the operational and techniques constraints of the migration project of our data center in Paris.
1240
1241If you have not followed everything on our technical constraints, I invite you to reread the article presenting the infrastructure of our web hosting. We built our migration scenarios by taking these constraints seriously.
1242
1243To overcome this high-risk project, we have considered several scenarios, each with their own set of operational difficulties and risks. Let’s take some time to see which migration projects we studied, and then we will explain how we selected the best one.
1244Scenarios
1245
1246Our main concern in all migration scenarios was to avoid the split brain problem. This problem happens when the system receives simultaneously data writes in both the source of the migration and the destination.
1247
1248Let’s take an example: an e-commerce site being migrated, available at the source and destination at the same time. If a customer of this site places an order, and this information arrives on the destination infrastructure, and that when he pays his order, the request arrives on the source infrastructure, the website can not make the link between the payment and order. This is what we call a split brain.
1249
1250To solve this type of problem, it would be necessary to harmonize the two databases, which is only possible when one controls the model of data, and thus generally the source code of the site.
1251
1252As a web hosting, we do not have the hands on the source code of our customers. And at our scale, we can not even imagine having to solve the problems we would encounter. Thus, we cannot consider any scenario involving split brain.
1253Migrating sites independently
1254
1255Our first idea was to migrate web sites independently of each other. This is also the solution that we recommended for customers wishing to quickly benefit from the benefits of Gravelines before embarking on this global migration.
1256
1257Here is what our customers usually had to do:
1258
1259 Identify all the databases used in their source code
1260 Configure the destination account
1261 Put the site in maintenance in the Paris data centre to avoid the split brain
1262 Migrate all the data: source code files, as well as the databases
1263 Configure the new database credentials in the site source code
1264 Check the website works as intended
1265 Modify their DNS zone in order to redirect their website to the new IP address of the new Gravelines Cluster
1266 Re-open the site and wait for the end of the DNS propagation delay.
1267
1268We have considered industrializing this technique to do it on behalf of the customer. But we were faced with several technical problems:
1269
1270 The reliable identification of all the databases used is complex. It is possible to search all database names in the source code of our customers, but it’s a very long operation, reliable only if the source code does not change during this time.
1271 It also requires dealing with many special cases: binary files executed in CGI, source code obfuscation, or even storage of the name of the databases in… a database. This technique does not allow us to consider a 100% reliability of our migration.
1272 The migration of the files can be done according to two techniques:
1273 In file mode,, the script of migration traverses the file tree and copies them one by one.
1274 In block mode, the migration script takes the data from the hard disk and transfers it bit-by-bit to the destination hard disk without taking into account the tree structure.
1275
1276Both methods allow you to copy the data reliably, but the intended use case is very different,
1277
1278With block mode you only can copy an entire disk or partition. If on the same partition there is data from several websites, only file mode allows to migrate a website data individually.
1279
1280Moving the data in file mode is very slow if the number of files to browse is important, as it’s the case for many PHP frameworks performing caching. So we had the risk of being unable to migrate some sites.
1281
1282 Modifying the source code is a perilous operation that we do not allow ourselves to do because the impact on the website can be significant. In addition, it requires to have exhaustively identified all uses of databases …
1283 A number of our customers do not host their DNS zone at home. We are then unable to change the IP address of their website without their intervention, which requires us to keep this IP address if we want to achieve a good level of reliability for the migration.
1284
1285We have therefore declined this scenario. Although functional for a large majority of websites, the small percentage of sites that would have been impacted represents actually a large number of websites. We should have repaired them manually and our team would have spent all his time there.
1286IP over Truck Carriers
1287
1288Internet is based on the IP protocol for addressing machines across the network. It is not dependent on the physical material on which the message is exchanged, it is possible to use many: optical links, electrical, wireless; even traveling pigeons, as described by a humorous standard set on April 1, 1990!
1289
1290This April Fool joke inspired us, although we are not experts in pigeons. Indeed, even if the latency (travel duration for a message from point A to point B) is important, the bandwidth (amount of information sent / travel time) is potentially huge: a USB key contains a lot of data ! In some large transfers, moving physically the data is a reasonable way to increase the bandwidth of a transfer.
1291
1292So we thought about the option of simply moving the infrastructure from Paris to Gravelines. This has advantages:
1293
1294 No impact on websites. We just have to turn the infrastructure back on in another data center and redirect traffic to it;
1295 It allows you to empty the data center very quickly.
1296
1297But it also poses some challenges:
1298
1299 How to reduce the time of cutting websites between the shutdown of the machines in Paris, their loading, transport, unloading and re-ignition? The cut-off time would be of the order of several days.
1300 What to do in the event of an accident on the journey? Falling server, road accident …
1301 How do you make sure that the data will not be altered during transport because of truck vibrations?
1302 How to integrate this infrastructure not respecting the standards of industrialization in force in Gravelines?
1303
1304None of this points was blocking points but they raised interesting challenges. We therefore kept this scenario, although not as our first choice because of the physical risks and the long time of unavailability of the websites during the operation.
1305Scalable Migrations
1306
1307Not being able to migrate the entire datacentre all at once, nor the websites independently, we have been looking at how to migrate our infrastructure assets as we go.
1308
1309So we took a step back and looked at the levels of granularity of our infrastructure, that is, the elements that link websites to each other and prevent a site-by-site migration:
1310
1311 IP addresses: we did not control all the DNS zones, we considered that the IP addresses of the websites of our customers could not be modified. This means that we must migrate all websites using the same IP address at once.
1312 The filerz: migration in file mode data on filerz not being possible because of the large number of files, we must perform a migration in block mode and thus migrate all customers in the same filerz simultaneously.
1313 Databases: all the databases on the same website must be migrated at the same time to keep the site running, and database identifiers must not change. These databases can potentially be used by two different locations, including different clusters; these accommodations must be migrated at the same time.
1314
1315If we consider all these assumptions, one conclusion is necessary: to respect them, we must migrate all sites at once because of interdependencies.
1316
1317We were in a blocking position. To move forward, it was necessary to review each postulate, and consider solutions to overcome these problems.
1318Breaking dependencies on databases
1319
1320One of the most complex constraints is to reliably migrate databases together with websites.
1321
1322Could we imagine a 95% reliable migration, taking into account only the databases provided with hosting (that is, leaving aside atypical cases that are found only by analyzing the source code of websites)?
1323
1324On paper it wouldn’t work, as the atypical websites would be impacted, since the databases would no longer be available.
1325
1326We thus needed to play with the availability of the database: if we manage to keep an accessible database even if it is not migrated, we can remove this constraint, and atypical cases would continue to work.
1327
1328This is technically possible if we open a network tunnel between our data centre in Gravelines and that in Paris. With this tunnel, a website using a database that is not referenced in its hosting would continue to work by fetching the data in Paris.
1329
1330This is not the ideal solution: adding a network tunnel means adding 10ms latency. And on some CMS performing dozens of SQL queries sequentially, this latency is quickly visible. But by limiting this effect to only non-referenced databases, we could simplify this strong constraint. All the sites would continue to operate. Some sites might experience some slowness, but for our usual cases of web hosting, the repercussions are minimal.
1331Bypassing IP Address Dependency
1332
1333Behind a single IP address are several hundred thousand websites. Migrating all filerz and databases would therefore involve a very important shutdown time.
1334
1335However, the question can be asked differently: an IP address serves several sites, how can we distribute incoming traffic on that IP to the right data centre hosting the right website? It is a load balancing concern and we already have a load balancer that adapts according to the requested website: the predictor.
1336
1337It is possible to define within the predictor where the website really is to redirect the traffic. The simplest solution would be adding a new predictor upstream of our infrastructure, but chaining load balances is not a good idea: it makes the path to the website more complex and adds a new critical element in the infrastructure.
1338
1339And finally, nothing prevents us to use the load balancing of Paris or Gravelines to perform this traffic redirection.
1340
1341We selected the predictor of our new Gravelines clusters. We added the list of websites and their status: migrated or not migrated. Thus, for migrated sites, the traffic is distributed locally. Otherwise, the traffic is redirected to a load balancer of the cluster in Paris.
1342
1343We knew how to migrate an IP address between our data centres. It is therefore possible to prepare these new predictors and then migrate the IP addresses of a whole cluster in a transparent way, without causing a customer break.
1344
1345IP addresses were no longer a blocking point. By lowering this constraint, we can now migrate customers filerz by filerz. Can we do even better?
1346Migrate the filerz as a block
1347
1348To do better we would need to decorrelate the clients of each filerz. How could we do it?
1349
1350Migrating an entire filerz takes time. We need to move several TBs of data across our network, it can take dozens of hours. To avoid split brain, we need to avoid writing at the source during the copy.
1351
1352But our Storage team knows how to handle this type of case, rather common for them. They begin by making a first copy of all the data, without closing the source service. Once this first copy is made, latter copies only need to synchronize the differences written since the first one. After several successive copies, the copy time is very short.
1353
1354At that moment, it is possible to cut the service for a few hours, at night, to perform the migration (filerz and the associated databases) without risking split brain.
1355
1356We now have all the elements to realize a new migration scenario!
1357
1358Come on, let’s do it again.
1359Our final scenario
1360
1361After reading the precedent sections, you probably already have a good idea of how we migrate clusters. Here are the main steps:
13621 / Construction of a new cluster in Gravelines, up to the standards of this new datacenter but including as many filerz and databases as the previous one.
13632 / Building a network link between the new and the old datacenter.
1364
13653 / Migration of IP traffic on the new cluster with redirection to Paris for non-migrated sites.
1366
13674 / Copying data without breaking websites.
1368
13695 / At night: cutting off the sites of the first filerz, migrating its data and associated databases.
1370
13716 / At night: shutdown of sites, migration of second filerz and associated databases.
1372
13737 / Closing the source cluster.
1374
1375
1376
1377Even after we have migrated all the filerz of a cluster, the network link to Paris data centre need to be kept up until all the databases have been migrated, at the very end of the migration. This link must thus be long-lasting and monitored for several months.
1378
1379This scenario was validated in July 2018. To make it operational, it took us 5 months of adaptation of our systems, as well as multiple blank repetitions on a test cluster specifically deployed to verify everything worked as intended. The scenario looked pretty on paper, but we had to solve many problems at every step of the process (we will go deeper into the technical details in future blog posts).
1380
1381Every step in this scenario implied dozens of synchronized operations between the different teams. We had to put in place a very precise follow-up of our operations to ensure that everything goes without a hitch (we will also talk about it in another post).
1382
1383Now you know our migration process. This article, though long, is necessary to understand the architectural choices, and the surprises that we encountered during our migration.
1384
1385In future posts, we will focus on specific points of our migration and the technical challenges we have encountered.
1386
1387See you soon for the rest of our adventures!
1388<|endoftext|>
1389<|startoftext|>
1390
1391MyBinder and OVH partnership
1392By Mael Le Gal / 2019-06-24 / Devops, Docker, Open Source, OVH Managed Kubernetes, Public Cloud
1393
1394Last month, OVH and Binder team partnered together in order to support the growth of the BinderHub ecosystem around the world.
1395OVH loves Binder and the Jupyter project
1396
1397With approximately 100,000 weekly users of the mybinder.org public deployment and 3,000 unique git repositories hosting Binder badges, the need for more resources and computing time was felt.
1398
1399Today, we are thrilled to announce that OVH is now part of the world-wide federation of BinderHubs powering mybinder.org. All traffic to mybinder.org is now split between two BinderHubs – one run by the Binder team, and another run on OVH infrastructure.
1400
1401So for those who don’t already know mybinder.org, here’s a summary.
1402What is Jupyter?
1403
1404Jupyter is an awesome open-source project that allows users to create, visualise and edit interactive notebooks. It supports a lot of popular programming languages such as Python, R and Scala as well as some presentation standards such as markdown, code snippet, charts visualisation…
1405
1406Example of a local Jupyter Notebook reading a notebook inside the OVH GitHub repository prescience client.
1407
1408The main use case is the ability to share your work with tons of people, who can try, use and edit the work directly from their web browser.
1409
1410Many researchers and professors are now able to work remotely on the same projects, without any infrastructure or environment issues. It’s a major improvement for communities.
1411
1412Here is for example a notebook (Github project) allowing you to use Machine Learning, from dataset ingestion to classification:
1413jupyter machine learning notebook example
1414
1415Example of a Machine Learning Jupyter Notebook
1416What is JupyterHub?
1417
1418JupyterHub is an even more awesome open-source project bringing the multi-user feature for Jupyter notebooks. With several pluggable authentication mechanisms (ex: PAM, OAuth), it allows Jupyter notebooks to be spawned on the fly from a centralised infrastructure. Users can then easily share their notebooks and access rights with each other. That makes JupyterHub perfect for companies, classrooms and research labs.
1419What is BinderHub?
1420
1421Finally, BinderHub is the cherry on the cake: it allows users to turn any Git repository (such as GitHub) into a collection of interactive Jupyter notebooks with only one click.
1422
1423Landing page of the binder project
1424
1425The Binder instance deployed by OVH can be accessed here.
1426
1427 Just choose a publicly accessible git repository (better if it already contains some Jupyter notebooks).
1428 Copy the URL of a chosen repository into the correct binder field.
1429 Click the launch button.
1430 If it is the first time that binder sees the repository you provide, you will see compilation logs appear. Your repository is being analysed and prepared for the start of a related Jupyter notebook.
1431 Once the compilation is complete you will be automatically redirected to your dedicated instance.
1432 You can then start interacting and hacking inside the notebook.
1433 On the initial binder page you will see a link to share your repository with others.
1434
1435How does it work?
1436Tools used by BinderHub
1437
1438BinderHub connects several services together to provide on-the-fly creation and registry of Docker images. It uses the following tools:
1439
1440 A cloud provider such as OVH.
1441 Kubernetes to manage resources on the cloud
1442 Helm to configure and control Kubernetes.
1443 Docker to use containers that standardise computing environments.
1444 A BinderHub UI that users can access to specify Git repos they want built.
1445 BinderHub to generate Docker images using the URL of a Git repository.
1446 A Docker registry that hosts container images.
1447 JupyterHub to deploy temporary containers for users.
1448
1449What happens when a user clicks a Binder link?
1450
1451After a user clicks a Binder link, the following chain of events happens:
1452
1453 BinderHub resolves the link to the repository.
1454 BinderHub determines whether a Docker image already exists for the repository at the latest reference (git commit hash, branch, or tag).
1455 If the image doesn’t exist, BinderHub creates a build pod that uses repo2docker to:
1456 Fetch the repository associated with the link.
1457 Build a Docker container image containing the environment specified in configuration files in the repository.
1458 Push that image to a Docker registry, and send the registry information to the BinderHub for future reference.
1459 BinderHub sends the Docker image registry to JupyterHub.
1460 JupyterHub creates a Kubernetes pod for the user that serves the built Docker image for the repository.
1461 JupyterHub monitors the user’s pod for activity, and destroys it after a short period of inactivity.
1462
1463A diagram of the BinderHub architecture
1464MyBinder Architecture
1465How we deployed it?
1466Powered by OVH Kubernetes
1467
1468One great thing about the Binder project is that it is completely cloud agnostic, you just need a Kubernetes cluster to deploy on.
1469
1470Kubernetes is one of the best choices to make when it comes to scalability on a micro-services architecture stack. The managed Kubernetes solution is powered by OVH’s Public Cloud instances. With OVH Load Balancers and integrated additional disks, you can host all types of workloads, with total reversibility.
1471
1472To this end, we used 2 services in the OVH Public Cloud:
1473
1474 A Kubernetes Cluster today consuming 6 nodes of C2-15 VM instances (it will grow in the future)
1475 A Docker Registry
1476
1477We also ordered a specific domain name so that our binder stack could be publicly accessible from anywhere.
1478Installation of HELM on our new cluster
1479
1480Once the automatic installation of our Kubernetes cluster was complete we downloaded the administration YAML file allowing us to manage our cluster and to launch kubectl commands on it.
1481
1482kubectl is the official and most popular tool used to administrate Kubernetes cluster. More information about how to install it can be found here.
1483
1484The automatic deployment of the full Binder stack is already prepared in the form of Helm package. Helm is a package manager for kubernetes and it needs a client part (helm) and a server part (tiller) to work.
1485
1486All information about installing helm and tille can be found here.
1487Configuration of our Helm deployment
1488
1489With tiller installed on our cluster, everything was ready to automate the deployment of binder in our OVH infrastructure.
1490
1491The configuration of the helm deployment is pretty straightforward and all the steps have been described by the Binder team here.
1492Integration into the binderhub CD/CI process
1493
1494The binder team already had a travis workflow existing for the automation of their test and deployment processes. Everything is transparent and they expose all their configurations (except secrets) on their GitHub project. We just had to integrate with their current workflow and push our specific configuration on their repository.
1495
1496We then waited for their next launch of their Travis workflow and it worked.
1497
1498From this moment onward, the ovh stack for binder was running and accessible by anyone from everywhere at this adress: https://ovh.mybinder.org/.
1499What comes next?
1500
1501OVH will continue engaging with the data open-source community, and keep building a strong relationship with the Jupyter foundation and more generally the python community.
1502
1503This first collaborative experience with such a data-driven open-source organisation helped us to establish the best team organisation and management to ensure that both OVH and the community achieve their goals in the best way possible
1504
1505Working with open source is very different from the industry as it requires a different mindset: very human-centric, where everyone has different objectives, priorities, timeline and points of view that should all be considered.
1506Special Thanks
1507
1508We are grateful to the Binder, Jupyter, and QuantStack team for their help, the OVH K8s team for the OVH Managed Kubernetes and OVH Managed Private Registry, and the OVH MLS team for the support. You rock, people!
1509<|endoftext|>
1510<|startoftext|>
1511
1512Linux Kernel Vulnerabilities Affecting The Selective ACK Component
1513By OVH / 2019-06-18 / OVH News, Security
1514
1515On June 18th 2019 at 7pm CEST, 4 vulnerabilities have been disclosed affecting the TCP stack of the Linux kernel. These vulnerabilities relies on an integer overflow in the Linux kernel which can lead to a kernel panic on one hand, and on an algorithmic complexity in the SACK implementation leading to CPU resource exhaustion on the other hand. In both cases, the impact is limited to the service availability.
1516Who is vulnerable ?
1517
1518 All Linux Oses running a kernel 2.6.29 and above (from March 2009)
1519 FreeBSD 12 using the RACK TCP stack. Note that fortunately, this is not the default stack, you can run the following commande to state whether your system is using « RACK » implementation or not :
1520 # sysctl net.inet.tcp.cc.algorithm
1521 If you expose a TCP service on the Internet (web service, ssh, rcp, …), your system is potentially affected since the attack only needs the TCP link to be established to succeed.
1522 If your service is behind a firewall, or iptables/pfsense is setup to open the service only to trusted IPs, you are safe.
1523
1524How to fix ?
1525
1526There are 3 different ways, you only need to pick ONE of them.
15271. Update your kernel
1528
1529The major Linux distros have already released the fix :
1530
1531 Linux version 4.4.182 or above
1532 Linux version 4.9.182 or above
1533 Linux version 4.14.127 or above
1534 Linux version 4.19.52 or above
1535 Linux version 5.1.11 or above.
1536 Note that Linux version 3.16 branch has not been announced as being patched yet. (2019-06-18)
1537
1538By the way, have a look at you distro’s website (Ubuntu, RedHat, SuSE, …) for more details, as your vendor might have backported the patch on its own kernel version.
15392. Firewall mitigation
1540
1541The attack is based on a small MSS option which is very unusual in standard network communication. Consequently, you can drop all packets with a small MSS option in order to prevent your system from the attack. For instance, the iptables rules may look like :
1542
1543# iptables -I INPUT -p tcp –tcp-flags SYN SYN -m tcpmss -mss 1:500 -j DROP -m comment –comment "Drop small MSS packets to prevent TCP SACK attacks. (ipv4)"
1544# ip6tables -I INPUT -p tcp –tcp-flags SYN SYN -m tcpmss -mss 1:500 -j DROP -m comment –comment "Drop small MSS packets to prevent TCP SACK attacks. (ipv6)"
1545
15463. Disable SACK (not recommended)
1547
1548Since the root cause of the vulnerability is an issue in the SACK handling, you also can disable SACK in your kernel. Keep in mind that SACK is an optimization of the TCP protocol in order to improve the performance of packet acknowledgement. Disabling this featuer may have a huge impact on your performance and we don’t recommend this mitigation.
1549Is the exploit public ?
1550
1551As far as we know (2019-06-18), no public exploit are public yet but this is probably a matter of hours/days.
1552Short Technical Explanations
1553CVE-2019-11477
1554
1555An integer overflow on a 16-bits counter (tcp_skb_pcount) may occur in the kernel which conducts to a BUG_ON (not strictly a panic, but will leave your system in an unstable – potentially unusable – state).
1556
1557By decreasing the MSS option to a small value, an attacker can force your system to send a lot of packets to a malicious remote IP under his control. The SACK feature will allow the remote malicious IP to ACK only few packets among all that have been sent.
1558
1559Your kernel will keep a list of the non-acknowledged packets which increments a 16-bits counter (tcp_skb_pcount) which overflows at some point and triggers a comparison error resulting in a BUG_ON.
1560CVE-2019-11478
1561
1562Using the same previous scenario, an attacker could fragment the Linux kernel’s Socket Buffer (SKB) by acknowledging only a few packets. The data structure is then fragmented which decreases the performance and can conduct the kernel to consume more CPU.
1563CVE-2019-11479
1564
1565By decreasing the MSS option to the smallest acceptable value (48 bytes), an attacker could slow down (freeze) the system. Since only 8 bytes remain for user data, the server may have difficulties to reply to requests sent by the attacker which may conduct to an abnormal CPU consumption by the kernel.
1566Identification numbers
1567
1568These vulnerabilities are referenced by the Common Vulnerabilities and Exposures as follow :
1569
1570 CVE-2019-11477 : SACK Panic (Linux >= 2.6.29) | CVSS : 8.2
1571 CVE-2019-11478 : SACK Slowness (Linux < 4.15) or Excess Resource Usage (all Linux version) | CVSS : 5.3
1572 CVE-2019-11479 :Excess Resource Consumption Due to Low MSS Values (all Linux versions) | CVSS : 7.5
1573 CVE-2019-5599 : SACK Slowness (FreeBSD 12 using the RACK TCP Stack) | Low severity
1574
1575External References
1576
1577 https://github.com/Netflix/security-bulletins/blob/master/advisories/third-party/2019-001.md
1578 https://access.redhat.com/security/vulnerabilities/tcpsack
1579 https://aws.amazon.com/fr/security/security-bulletins/AWS-2019-005/
1580 https://bugzilla.suse.com/show_bug.cgi?id=1137586
1581<|endoftext|>
1582<|startoftext|>
1583
1584RAMBleed DRAM
1585By OVH / 2019-06-12 / OVH News, Security
1586
1587On June 11th, security researchers published a paper titled “RAMBleed Reading Bits in Memory without Accessing Them”. This paper describes vector against Dynamic Random Access Memory (DRAM) modules that are already susceptible toRowhammer-style attacks.
1588
1589Systems using DRAM modules mitigated against Rowhammer style attacks remain protected from RAMBleed.
1590
1591This vector may affects hardware products, among which some used by OVH.
1592
1593RAMBleed DRAM
1594
1595This vulnerability has been referenced as:
1596
1597· CVE-2019-0174
1598
1599According to the researchers, “RAMBleed’s rate of reading memory is modest, toping at around 3–4 bits per second. This allows sufficient time for memory scrubbing countermeasures to remove shortlived secret data from the target’s memory.“
1600
1601 At this point of time, OVH has not received any information demonstrating that the vector has been exploited on its infrastructure.
1602
1603 Note that in order to gain access to a secret, the attacker should be authenticated on the operating system, and no other process should be running.
1604
1605We estimate the risk for our clients as very low as this vector is unable to cross Virtual Machine boundaries.
1606
1607 To remain up to date on any topic regarding OVH’s technical environment, do not hesitate to consult regularly:
1608
1609 OVH Blog
1610 OVH Tasks
1611<|endoftext|>
1612<|startoftext|>
1613
1614Web hosting: how to host 3 million websites?
1615By Vincent Cassé / 2019-06-07 / Infrastructure, Migration, Web Hosting
1616
1617In 2018, we launched one of the largest projects in OVH’s history: migrating the 3 million websites hosted in our Paris datacentre. If you want to discover the reasons behind this titanic project, we took a look at them in this post.
1618
1619It is now time to explain how we proceeded with this project. We have already talked about our operational constraints, such as ensuring we’d no impact on the websites, even though we do not control the source code, and limiting the downtime. We also had many technical constraints, related to the architecture of our services. We’ll take a close look at our different migration scenarios in the next article, but today, we will explain how the infrastructure hosting your websites works.
1620The anatomy of web hosting
1621
1622To work, websites and web applications usually need two things:
1623
1624 The source code, responsible for executing the behaviour of the website
1625 The data used by the source code to customise the experience
1626
1627For a site to be operational, the source code is run on one or more servers whose environment has been configured with the tools related to the programming languages used. PHP currently dominates the market, but the choice is not limited to this language.
1628
1629To remain in operational condition, these servers must be maintained and updated, and their operation must be continuously monitored. This role is usually different to that of developers, who are responsible for creating the source code of the website. If you’re interested in these roles, we’d recommend learning more about system administrators and DevOps, and their specific skills.
1630
1631For a website owner, monitoring the infrastructure and its operations can be very expensive: you need to have a team large enough to maintain a presence 24 hours a day, 365 days a year. With such a team, developers can only rely on one-off service requests when the website owner wants to make changes.
1632
1633It is this level of management that we offer through our web hosting solution. Instead of recruiting a system administrator team for each website, we have recruited a team of technical experts who take care of all the websites we host. The same goes for the data, which is stored in specific servers, and operated by those same teams of experts.
1634
1635This way, each website does not need to utilise the resources of an entire server. With our solutions, we pool resources on our infrastructures, so individual servers can be used to run several websites at once.
1636
1637All of these economies of scale allow us to offer a low-cost web hosting solution (from €1.49/month), while maintaining a high level of quality. We will explain very shortly how we calculate our quality of service, but if you do not want to wait, watch the conference that our developers hosted at the FOSDEM 2019.
1638
1639To reach the required level of quality while managing so many websites, we have adapted our infrastructure.
1640The technical architecture of our web hosting solutions
1641
1642The classic approach to building a web hosting service is to install a server, configure the databases and a source code execution environment, and then place new clients there until the server is full, before moving on to the next server.
1643
1644This is very effective, but there are some disadvantages with this type of architecture:
1645
1646 In the event of breakdown, the recovery can be long. Unless you have real-time system replication, which is very expensive, you have to recover your data from the backup and then migrate it to a new machine before reopening the service to the clients. Although the probability of hardware or software failure is low, this is a common operation on large infrastructures.
1647 To deal with new technologies, it would be preferable to only introduce them on new servers during the first phase, and then take the time to progressively deploy them on existing servers in future phases.
1648 However, in a market as fast-moving as the internet, it becomes difficult to maintain a heterogeneous infrastructure. Technical teams must then adapt, and work with multiple different configurations, increasing the risk of regression or breakdown. And for support teams, it becomes very difficult to memorise all the different configurations and keep track of the ongoing changes.
1649 Each software brick is inherently complex, so to achieve the best performance and quality of service, we have created teams that specialise in each technology: database, PHP environment, storage systems, network… Having all these teams interact on the same server can be difficult, and lead to misunderstandings regarding the availability of websites.
1650 It is difficult to isolate a customer who consumes more resources than the average. If you’re lucky, you won’t be on the same server as this customer, but if you are, the performance of your website may suffer as a result of their resource consumption.
1651
1652As you can see, at our scale, we chose to take a different approach, in the form of a well-known pattern for applications that must deal with the load: an n-tier architecture.
1653N-tier architecture
1654
1655High-load sites can leverage n–tier architectures to provide more resources for each software brick by distributing them across several servers. If we go further with our code/data division, we therefore need three servers for running a website:
1656
1657 A server responsible for the execution of the source code
1658 Two frequently-used storage servers: a database server and a file server
1659
1660N-tier architecture
1661File servers: Filerz
1662
1663These are the servers on which we store the files that make up the website (usually the website’s source code). They are equipped with specific hardware to ensure fast data access and durability. We also use a specialist file system called ZFS, which allows us to easily manipulate local and remote backups.
1664
1665They are managed by a dedicated Storage team, who also provide this type of service to several other teams at OVH (web hosting, emails…). If this is of interest to you, they also offer NAS-HA , which you will find in the public catalogue of our dedicated servers.
1666
1667In case of failure, we keep a stock of parts and servers readily available, to restore service as quickly as possible.
1668Database servers
1669
1670Database servers are typically used by websites to dynamically host site data. They utilise of a database management system (DBMS) (MySQL with our solutions) which structures the data and makes it accessible through a query language (SQL in our case).
1671
1672These servers also require specific hardware for storing data, specifically high RAM, in order to take advantage of DBMS cache systems and respond faster to search queries.
1673
1674These machines are managed by a dedicated Database team, who are responsible for this infrastructure. The Database team offers these services to the public through the CloudDB offer, and also handles the private SQL.
1675
1676In Paris, these servers are hosted on a Private Cloud (SDDC),which us allows to switch virtual machines on the fly from one physical machine to another in case of problems. This helps reduce downtime during maintenance, and recovery times in the event of failure.
1677Web servers
1678
1679These are the servers that receive the requests and execute the website source code. They get their data from the files and databases provided by the previously-described Filerz and database servers. The main requirement for these servers is good CPU resources, which are needed for executing the source code.
1680
1681Since these web servers are stateless (i.e. they do not store any data locally), it is possible to add more of them and distribute the load across several machines. This allows us to distribute websites across different servers and avoid distortions of use across the infrastructure, as uses are dynamically distributed across all the servers of the farm.
1682
1683If one of the servers goes down, other servers in the farm are available to maintain the traffic. This allows us to use a wide range of the servers available in our inventory, provided they have good CPU capabilities.
1684Load balancing
1685
1686Queries don’t arrive at the intended web server by magic. For this, you need an entry point that sends the requests to the right server. This technology is called ‘load balancing’.
1687
1688Our Paris datacentre features servers whose hardware is fully dedicated to load balancing. In our new Gravelines infrastructure, however, we use a public brick: IPLBs.
1689
1690Website traffic arrives at a few IP addresses ( https://docs.ovh.com/en/hosting/list-addresses-ip-clusters-and-web-hostings/#cluster-025 ) that we dedicate to our web hosting service. These IP addresses are managed by our load balancers. They are therefore the entry point of our infrastructures. We have also implemented the very best anti-DDoS technology, to protect our clients’ websites.
1691
1692These load balancers work perfectly for high volumes of website traffic, spread across multiple web servers, as queries are distributed fairly, via load balancing algorithms. But our requirements are different. We wish to have several different sites on a single server, and change the allocation based on several criteria: the customer’s hosting package (as more premium offers involve fewer websites per server), and the resources required to continuously distribute the load.
1693
1694We also offer solutions where resources are guaranteed, such as Performance Hosting, or even fully dedicated, like Cloud Web.
1695
1696In fact, the load distribution is very strongly tied to our customers. We have changed the distribution system with a brick dedicated to OVH, named predictor, that chooses the web server according to the request’s website. The predictors adapt to our infrastructure’s metrics, and the information provided by our system.
1697Web hosting architecture with Load Balancer and Predictor
1698
1699All this makes our infrastructure a bit more complex than normal, although we won’t go much further into the details in order to keep thing simple and within the scope of this blog post. This should have provided enough of an overview to explain the possible migration scenarios.
1700
1701By adding load balancing, as well as multiple database servers and file storage, this architecture allows us to host an incredibly large number of different websites. But as all infrastructure administrators know, “Shit happens!”. One day or another, failure will happen. It is therefore necessary to know how to react in such cases, in order to minimise the impact.
1702Fault domains
1703
1704One of the techniques for reducing the impact of failures is to limit their perimeter by creating fault domains. Outside the world of computer science, we see similar concepts in forest management, with the use of empty parcels as firestoppers, or in the building industry, with the the doors of the same name.
1705
1706In our business, it’s about dividing the infrastructure into pieces, and distributing customers across different clusters. We therefore divided the Paris infrastructure into 12 identical clusters. In each cluster, we find the load balancer, the web servers and the Filerz. If one of the clusters goes down, “only“ 1/12 of the customers with sites hosted at that datacentre are affected.
1707
1708Database servers are treated separately. Although don’t highlight it as a feature, we allow our customers to share the use of their databases between their hosting solutions when they need to share data. Since the customer isn’t able to choose the cluster of their websites, we have separated the databases from the clusters, in order to make them accessible to the entire datacentre.
1709
1710So for the last time, we need to update our infrastructure schema…
1711Architecture with fault domains
1712Infrastructure management
1713
1714This entire infrastructure is managed by our information system, using real-time configuration, which forms the link between the OVH Control Panel, the OVH APIs, and the infrastructure itself.
1715
1716The information system includes an exhaustive representation of the infrastructure, which makes it possible to adapt the delivery of new accounts, manage changes in our offers, and perform technical actions on accounts.
1717
1718For instance, when you create a new database on your hosting package, the information system takes care of selecting the server on which it will be located, to make sure it is created on the same infrastructure, before notifying you of its availability via email or API.
1719
1720Congratulations… you now know a bit more about our architecture! To find out where your own website is running, you can find the names of your database servers, Filerz and clusters linked to your hosting in the OVH Control Panel.
1721Technical constraints for migration
1722
1723This architecture imposes some technical constraints, if the websites hosted on it are to continue working as intended:
1724
1725 All the websites in the same cluster share the same IP address
1726 Database servers and hosting clusters are uncorrelated
1727 In order to migrate a website, you must synchronise its migration with the migration of all its associated elements, i.e. the load balancer, Filerz, and databases
1728 The source code of a website can use a database that is not referenced on its web hosting
1729 The source code can include references to the infrastructure (absolute link including the filerz number , the cluster name, the name of the database servers …)
1730
1731You now know all the operational and technical constraints related to the datacentre migration project. In the next article, we will discuss the different migration scenarios we considered, and the one we eventually chose.
1732
1733See you soon!
1734<|endoftext|>
1735<|startoftext|>
1736
1737How to run massive data operations faster than ever, powered by Apache Spark and OVH Analytics Data Compute
1738By Mojtaba Imani / 2019-05-27 / apache, Apache Spark, Big Data, Machine learning, Spark
1739
1740If you’re reading this blog for the first time, welcome to the ongoing data revolution! Just after the industrial revolution came what we call the digital revolution, with millions of people and objects accessing a world wide network – the internet – all of them creating new content, new data.
1741
1742Let’s think about ourselves… We now have smartphones taking pictures and sending texts, sports watches collecting data about our health, Twitter and Instagram accounts generating content, and many other use cases. As a result, data in all its forms is exponentially exploding all over the world.
1743
174490% of the total data in the world was generated during last two years. According to IDC, the amount of data in the world is set to grow from 33 zettabytes in 2018 to 175 zettabytes in 2025. When we do a basic division, this represents approximately 34TB of data per person, including all countries and topologies.
1745Annual size of the global datasphere
1746
1747Impressive, isn’t it?
1748
1749This opens up a lot of new concepts and usages, but also, of course, new challenges. How do we store this data? How do we keep it secure and private? And the last but not least, how do we get value from this data, as this new giant datasphere needs to be processed? In other words, it needs to be used to extract values.
1750
1751Potential results and applications are infinite: improving the agricultural field by analysing weather forecasts, understanding customers deeply, researching new vaccines, redefining urban environments by analysing traffic jams… The list goes on.
1752
1753It seems easy at first, but it requires three main elements:
1754
1755 First, we need data. Sometimes these data sources can be heterogeneous (text, audio, video, pictures etc.), and we may need to “clean” them before they can be used efficiently.
1756 Next, we need compute power. Think again about ourselves again: our brains can perform a lot of calculations and operations, but it’s impossible to split one task between multiple brains. Ask a friend to do multiplication with you, and you’ll see this for yourself. With computers though, anything is possible! We are now able to parallelise calculations across multiple computers (i.e. a cluster), allowing us to get the results we want faster than ever.
1757 Last, we need a framework, which is a bunch of tools that allow you to use this datalake and compute power efficiently.
1758
1759Apache Spark & OVH Analytics Data Compute
1760
1761How do we build this? Let’s find out together!
1762Step 1: Find the right framework
1763
1764As you’ll have seen from the title of this post, it’s not a secret that Apache Spark is our preferred tool at OVH.
1765
1766We chose Apache Spark because it is an open-source distributed, general-purpose cluster-computing framework that has the largest open-source community in the world of big data, and it is up to 100 times faster than the previous cluster computing framework, Hadoop MapReduce, thanks to nice features like in-memory processing and lazy evaluation. Apache Spark is the leading platform for large-scale SQL, batch processing, stream processing and machine learning, with an easy-to-use API, and for coding in Spark, you have the option of using different programming languages, including Java, Scala, Python, R and SQL.
1767
1768Other tools, like Apache Flink and Beam, look very promising as well, and will be part of our upcoming services.
1769
1770The different components of Apache Spark are:
1771
1772 Apache Spark Core, which provides in-memory computing, and forms the basis of other components
1773 Spark SQL, which provides structured and semi-structured data abstraction
1774 Spark Streaming, which performs streaming analysis using RDD (Resilient Distributed Datasets) transformation
1775 MLib (Machine Learning Library), which is a distributed machine learning framework above Spark
1776 GraphX, which is a distributed graph processing framework on top of Spark
1777
1778The Apache Spark architecture principle
1779
1780Before going further, let’s take the time to understand how Apache Spark can be so fast by reviewing its workflow.
1781
1782Here is a sample code in Python, where we will read a file and count the number of lines with the letter ‘a’, and the number of lines with the letter ‘b’.
1783
1784from pyspark import SparkContext
1785
1786logFile = "YOUR_SPARK_HOME/README.md" # Should be some file on your system
1787sc = SparkContext("local", "Simple App")
1788logData = sc.textFile(logFile).cache()
1789
1790numAs = logData.filter(lambda s: 'a' in s).count()
1791numBs = logData.filter(lambda s: 'b' in s).count()
1792
1793print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
1794
1795sc.stop()
1796
1797This code is part of your Spark Application, also known as your Driver Program.
1798
1799Each action (count(), in our example) will trigger jobs. Apache Spark will then split your work into multiple tasks that can be computed separately.
1800
1801Apache Spark stores data in RDD (Resilient Distributed Datasets), which is an immutable distributed collection of objects, and then divides it into different logical partitions, so it can process each part in parallel, in different nodes of the cluster.
1802
1803Task parallelism and in-memory computing are the key to being ultra-fast here. You can go deeper in the official documentation.
1804Step 2: Find the required compute power
1805
1806We now have the tools, but they need compute power (we are mainly talking about CPU and RAM memory) to perform such massive operations, and this has to be scalable.
1807
1808Lets talk about creating a cluster of computers. The old-fashioned way is to buy physical computers and the network equipment to connect them together, installing the OS and all required software and packages, installing Apache Spark on all the nodes, then configuring Spark’s standalone cluster management system and connecting all workers to the master node.
1809
1810Obviously, this isn’t the best way. It takes a lot of time and needs some skilled engineers to do all the stuff. Also, assume that you did this difficult job and then finished your big data processing… What are you going to do with the cluster after that? Just leave it there or sell it on the second-hand market? What if you decided to perform some larger-scale processing and needed to add more computers to your cluster? You’d need to do all the software and network installations and configuration for new nodes.
1811
1812A better way of creating a cluster is to use a Public Cloud provider. This way, you will have your servers deployed very quickly, only pay what you consume, and can delete the cluster after finishing your processing task. You’ll also be able to access your data much more easily than you would with an on-premises solution. It’s not a coincidence that, according to IDC, half of the total data in the world will be stored in the public cloud by 2025 [3].
1813Where is the data stored?
1814
1815But the main problem persists: you still need to install all the software and packages on each of the servers in your virtual cluster, then configure the network and routers, take security measures and configure the firewall, and finally, install and configure the Apache Spark cluster management system. It will take time and be prone to errors, and the longer it takes, the more you will be charged due to having those servers deployed in your cloud account.
1816Step 3: Take a rest, and discover OVH Analytics Data Compute
1817
1818As we’ve just seen, building a cluster can be done manually, but it’s a boring and time-consuming task.
1819
1820At OVH, we solved this problem by introducing a cluster-computing service called Analytics Data Compute, which will create a 100% ready, fully installed and configured Apache Spark cluster on the fly. By using this service, you don’t need to waste your time on server creation, network, firewalls and security configurations on each node of your cluster. You just focus on your tasks, and the compute cluster you need will appear as if by magic!
1821
1822In fact, there’s nothing really magic about it… just automations made by OVH to simplify both our our life and yours. We needed this kind of tool internally for large computations, and then crafted it into a product for you.
1823
1824The concept is quite simple: you launch an Apache Spark job as normal through the command line or API, and a full Apache Spark cluster will be built on the fly, just for your job. Once the processing is done, we delete the cluster and you’re invoiced for the exact resources that were used (on an hourly basis, for now).
1825
1826This way, we are able to rapidly scale from one to thousands of virtual machines, allowing you to use thousands of CPU cores and thousands GB of memory RAM.
1827
1828To use Analytics Data Compute, you need to download a small, open-source client software package from OVH repository, called ovh-spark-submit.
1829
1830This client was made with a view to keeping the official spark-submit command line syntax of Apache Spark. Most of the options and syntax are the same, although the OVH version has some more options related to infrastructure and cluster management. So, this way, you simply request to run your code over your data in a cluster of specific nodes, and the tool will create a cluster with the specified number of nodes, install all packages and software (including Spark and its cluster management system), and then configure the network and firewall. After creating the cluster, OVH Analytics Data Compute will run your Spark code over it, return the result to the user, and then delete the whole thing once it’s done. Much more efficient!
1831Let’s get it started… Feel the power!
1832
1833The good news is that If you are already familiar with the spark-submit command line of Apache Spark, you don’t need to learn any new command line tools, as ovh-spark-submit uses almost the exact same options and commands.
1834
1835Let’s look at an example, where we’ll calculate the famous Pi number’s decimals, first with the original Apache Spark syntax, and then with the ovh-spark-submit client:
1836
1837./spark-submit \
1838 --class org.apache.spark.examples.SparkPi \
1839 --total-executor-cores 20 \
1840 SparkPI.jar 100
1841
1842./ovh-spark-submit \
1843 --class org.apache.spark.examples.SparkPi \
1844 --total-executor-cores 20 \
1845 SparkPI.jar 100
1846
1847You can see that the only difference is “ovh-” at the beginning of the command line, while the rest is the same. And by running the ovh-spark-submit command, you will run the job over a cluster of computers with 20 cores instead of just your local computer. This cluster is fully dedicated to this job, as it will be created after running the command, then deleted once it’s finished.
1848
1849Another example is the popular word-count use case. Let’s assume you want to calculate the number of words in a big text file, using a cluster of 100 cores. The big text file is stored in OpenStack Swift storage (although it could be any online or cloud storage system). The Spark code for this calculation in Java looks like this:
1850
1851JavaRDD<String> lines = spark.read().textFile("swift://textfile.abc/novel.txt").javaRDD();
1852
1853JavaRDD<String> words = lines.flatMap(s -> Arrays.asList(SPACE.split(s)).iterator());
1854JavaPairRDD<String, Integer> ones = words.mapToPair(s -> new Tuple2<>(s, 1));
1855JavaPairRDD<String, Integer> counts = ones.reduceByKey((i1, i2) -> i1 + i2);
1856List<Tuple2<String, Integer>> output = counts.collect();
1857
1858We can select the desired version of Spark as well. For this example, we’ve selected Spark version 2.4.0, and the command line for running this Spark job looks like this:
1859
1860./ovh-spark-submit \
1861 --class JavaWordCount \
1862 --total-executor-cores 100 \
1863 --name wordcount1 \
1864 --version 2.4.0 \
1865 SparkWordCount-fat.jar
1866
1867To create our Spark cluster, we use nodes that have four vCores and 15GB of RAM. Therefore, by running this command, a cluster of 26 servers will be created (one for the master node and 25 for workers), so we will have 25×4=100 vCores and 25×15=375GB of RAM.
1868
1869After running the command line, you will see the progress of creating the cluster and installing all the required software.
1870
1871Once the cluster is created, you can take a look at it with the official Spark dashboard, and check if your cluster has all 25 workers up and running:
1872
1873Also, if you go to the OpenStack Horizon dashboard in your OVH cloud account, you will see all 26 servers:
1874
1875The Apache Spark job will be executed according to the java code-in-jar file that we sent to Spark cluster, and the results will be shown on the screen. Also, the results and the complete log files will be saved in both the local computer and the user’s Swift storage.
1876
1877Once you’re done, you will see the message that cluster has been deleted, and the addresses of the logs in OpenStack Swift storage and local computer. You can see in the following screenshot that creating a fully installed and configured Spark cluster with 26 servers took less than five minutes.
1878A bit more about OVH Analytics Data Compute
1879
1880If you are curious, here are some additional details about Analytics Data Compute:
1881
1882 Everything is built on the OVH Public Cloud, which means everything is powered by OpenStack.
1883 You can choose the Apache Spark version you want to run, directly in the command line. You can also, of course, run multiple clusters with different versions.
1884 A new dedicated cluster will be created for each request, and will be deleted after finishing the job. This means there are no security or privacy issues caused by having multiple users for a single cluster.
1885 You have the option of keeping your cluster after finishing the job. If you add the keep-infra option to the command line, the cluster will not be deleted when you’re done. You can then send more jobs to that cluster or view more details from the logs.
1886 Your cluster computers are created in your own OVH Public Cloud project, so you have full control of your cluster computers.
1887 Results and output logs will be saved in Swift on your OVH Public Cloud project. Only you will have access to them, and you will also have the full history of all your Spark jobs saved in a folder, organised by date and time of execution.
1888 Input and output of data can be any source or format. There is no vendor lock-in when it comes to storage, so you are not forced to only use OVH cloud storage to store your data, and can use any online or cloud storage platform on the public internet.
1889 You can access your Cluster and Spark dashboards and web UIs via HTTPS.
1890
1891Let’s focus on cluster management systems
1892
1893In Apache Spark clusters, there are independent processes on all cluster nodes called “executors”, which are coordinated by the driver program. For allocating resources of cluster across applications, the driver program should connect to a cluster management system, after which it will send application code and tasks to executors.
1894
1895There are several options when it comes to cluster management systems, but to keep things fast and simple, we selected the Spark standalone cluster management system. This offers our users the freedom to choose any version of Spark, and also makes cluster installation faster than the other options. If, for example, we had selected Kubernetes as our cluster management system, our users would have been limited to Spark versions 2.3 or above, and cluster installation would have been more time-consuming. Alternatively, if we wanted to deploy a ready-to-use Kubernetes cluster (like OVH Managed Kubernetes), then we would have lost our scalability, because the infrastructure of our Apache Spark cluster would have been inherently limited by the infrastructure of the Kubernetes cluster. But with our current design, users can have an Apache Spark cluster with as many servers as they like, and the freedom to scale easily.
1896Try it yourself!
1897
1898To get started with Analytics Data Compute, you just need to create a cloud account at www.ovh.com, then download the ovh-spark-submit software, and run it as described in the OVH documentation page. Also, if you participate in a short survey on our OVH Labs page, you will receive a voucher, which will let you test Analytics Data Compute first-hand, with 20 euros of free credit.
1899
1900If you have any questions or would like further explanation, our team is available through our Gitter channel
1901<|endoftext|>
1902<|startoftext|>
1903
1904Deploying a FaaS platform on OVH Managed Kubernetes using OpenFaaS
1905By Horacio Gonzalez / 2019-05-24 / FaaS, Kubernetes, OpenFaaS, OVH Managed Kubernetes, OVH Platform
1906
1907Several weeks ago, I was taking part in a meetup about Kubernetes, when one of the attendees made a remark that resonated deeply with me…
1908
1909 Hey, Horacio, that Kubernetes thing is rather cool, but what I would have loved to see is a Functions-as-a-Service platform. Most of my apps could be easily done with a database and several serverless functions!
1910
1911It wasn’t the first time I’d got that question…
1912
1913Being, above all, a web developer, I can definitely relate. Kubernetes is a wonderful product – you can install complicated web architectures with a click – but what about the database + some functions model?
1914
1915Well, you can also do it with Kubernetes!
1916
1917That’s the beauty of the rich Kubernetes ecosystem: you can find projects to address many different use cases, from game servers with Agones to FaaS platforms…
1918There is an Helm chart for that!
1919
1920Saying “You can do it with Kubernetes!” is almost the new “There is an app for that!”, but it doesn’t help a lot of people who are looking for solutions. As the question had come up several times, we decided to prepare a small tutorial on how to deploy and use a FaaS platform on OVH Managed Kubernetes.
1921
1922We began by testing several FaaS platform on our Kubernetes. Our objective was to find the following solution:
1923
1924 Easy to deploy (ideally with a simple Helm chart)
1925 Manageable with both an UI and a CLI, because different customers have different needs
1926 Auto-scalable, in both the upscaling and downscaling senses
1927 Supported by comprehensive documentation
1928
1929We tested lots of platforms, like Kubeless, OpenWhisk, OpenFaaS and Fission, and I must say that all of them performed quite well. In the end though, the one that scored the best in terms of our objectives was OpenFaaS, so we decided to use it as the reference for this blog post.
1930OpenFaaS – a Kubernetes-native FaaS platform
1931OpenFaaS
1932
1933OpenFaaS is an open-source framework for building serverless functions with Docker and Kubernetes. The project is already mature, popular and active, with more than 14k stars on GitHub, hundreds of contributors, and lots of users (both corporate and private).
1934
1935OpenFaaS is very simple to deploy, using a Helm chart (including an operator for CRDs, i.e. kubectl get functions). It has both a CLI and a UI, manages auto-scaling effectively, and its documentation is really comprehensive (with a Slack channel to discuss it, as a nice bonus!).
1936
1937Technically, OpenFaaS is composed of several functional blocks:
1938
1939 The Function Watchdog. A tiny golang HTTP server that transforms any Docker image into a serverless function
1940 The API Gateway, which provides an external route into functions and collects metrics
1941 The UI Portal, which creates and invokes functions
1942 The CLI (essentially a REST client for the API Gateway), which can deploy any container as a function
1943
1944Functions can be written in many languages (although I mainly used JavaScript, Go and Python for testing purposes), using handy templates or a simple Dockerfile.
1945OpenFaaS Architecture
1946Deploying OpenFaaS on OVH Managed Kubernetes
1947
1948There are several ways to install OpenFaaS on a Kubernetes cluster. In this post we’re looking at the easiest one: installing with Helm.
1949
1950 If you need information on how to install and use Helm on your OVH Managed Kubernetes cluster, you can follow our tutorial.
1951
1952The official Helm chart for OpenFaas is available on the faas-netes repository.
1953Adding the OpenFaaS Helm chart
1954
1955The OpenFaaS Helm chart isn’t available in Helm’s standard stable repository, so you’ll need to add their repository to your Helm installation:
1956
1957helm repo add openfaas https://openfaas.github.io/faas-netes/
1958helm repo update
1959
1960Creating the namespaces
1961
1962OpenFaaS guidelines recommend creating two namespaces, one for OpenFaaS core services and one for the functions:
1963
1964kubectl apply -f https://raw.githubusercontent.com/openfaas/faas-netes/master/namespaces.yml
1965
1966Generating secrets
1967
1968A FaaS platform that’s open to the internet seems like a bad idea. That’s why we are generating secrets, to enable authentication on the gateway:
1969
1970# generate a random password
1971PASSWORD=$(head -c 12 /dev/urandom | shasum| cut -d' ' -f1)
1972
1973kubectl -n openfaas create secret generic basic-auth \
1974 --from-literal=basic-auth-user=admin \
1975 --from-literal=basic-auth-password="$PASSWORD"
1976
1977 Note: you will need this password later in the tutorial (to access the UI portal, for example). You can view it at any point in the terminal session with echo $PASSWORD.
1978
1979Deploying the Helm chart
1980
1981The Helm chart can be deployed in three modes: LoadBalancer, NodePort and Ingress. For our purposes, the simplest way is simply using our external Load Balancer, so we will deploy it in LoadBalancer, with the --set serviceType=LoadBalancer option.
1982
1983 If you want to better understand the difference between these three modes, you can read our Getting external traffic into Kubernetes – ClusterIp, NodePort, LoadBalancer, and Ingress blog post.
1984
1985Deploy the Helm chart as follows:
1986
1987helm upgrade openfaas --install openfaas/openfaas \
1988 --namespace openfaas \
1989 --set basic_auth=true \
1990 --set functionNamespace=openfaas-fn \
1991 --set serviceType=LoadBalancer
1992
1993As suggested in the install message, you can verify that OpenFaaS has started by running:
1994
1995kubectl --namespace=openfaas get deployments -l "release=openfaas, app=openfaas"
1996
1997If it’s working, you should see a list of the available OpenFaaS deployment objects:
1998
1999$ kubectl --namespace=openfaas get deployments -l "release=openfaas, app=openfaas"
2000NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
2001alertmanager 1 1 1 1 33s
2002faas-idler 1 1 1 1 33s
2003gateway 1 1 1 1 33s
2004nats 1 1 1 1 33s
2005prometheus 1 1 1 1 33s
2006queue-worker 1 1 1 1 33s
2007
2008Install the FaaS CLI and log in to the API Gateway
2009
2010The easiest way to interact with your new OpenFaaS platform is by installing faas-cli, the command line client for OpenFaaS on a Linux or Mac (or in a WSL linux terminal in Windows):
2011
2012curl -sL https://cli.openfaas.com | sh
2013
2014You can now use the CLI to log in to the gateway. The CLI will need the public URL of the OpenFaaS LoadBalancer, which you can get via kubectl:
2015
2016kubectl get svc -n openfaas gateway-external -o wide
2017
2018Export the URL to an OPENFAAS_URL variable:
2019
2020export OPENFAAS_URL=[THE_URL_OF_YOUR_LOADBALANCER]:[THE_EXTERNAL_PORT]
2021
2022 Note: you will need this URL later on the tutorial, for example to access the UI portal. You can see it at any moment in the terminal session by doing echo $OPENFAAS_URL.
2023
2024And connect to the gateway:
2025
2026echo -n $PASSWORD | ./faas-cli login -g $OPENFAAS_URL -u admin --password-stdin
2027
2028Now your’re connected to the gateway, and you can send commands to the OpenFaaS platform.
2029
2030By default, there is no function installed on your OpenFaaS platform, as you can verify with the faas-cli list command.
2031
2032In my own deployment (URLs and IP changed for this example), the preceding operations gave:
2033
2034$ kubectl get svc -n openfaas gateway-external -o wide
2035 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
2036 gateway-external LoadBalancer 10.3.xxx.yyy xxxrt657xx.lb.c1.gra.k8s.ovh.net 8080:30012/TCP 9m10s app=gateway
2037
2038 $ export OPENFAAS_URL=xxxrt657xx.lb.c1.gra.k8s.ovh.net:8080
2039
2040 $ echo -n $PASSWORD | ./faas-cli login -g $OPENFAAS_URL -u admin --password-stdin
2041 Calling the OpenFaaS server to validate the credentials...
2042 WARNING! Communication is not secure, please consider using HTTPS. Letsencrypt.org offers free SSL/TLS certificates.
2043 credentials saved for admin http://xxxrt657xx.lb.c1.gra.k8s.ovh.net:8080
2044
2045$ ./faas-cli version
2046 ___ _____ ____
2047 / _ \ _ __ ___ _ __ | ___|_ _ __ _/ ___|
2048| | | | '_ \ / _ \ '_ \| |_ / _` |/ _` \___ \
2049| |_| | |_) | __/ | | | _| (_| | (_| |___) |
2050 \___/| .__/ \___|_| |_|_| \__,_|\__,_|____/
2051 |_|
2052CLI:
2053 commit: b42d0703b6136cac7b0d06fa2b212c468b0cff92
2054 version: 0.8.11
2055Gateway
2056 uri: http://xxxrt657xx.lb.c1.gra.k8s.ovh.net:8080
2057 version: 0.13.0
2058 sha: fa93655d90d1518b04e7cfca7d7548d7d133a34e
2059 commit: Update test for metrics server
2060Provider
2061 name: faas-netes
2062 orchestration: kubernetes
2063 version: 0.7.5
2064 sha: 4d3671bae8993cf3fde2da9845818a668a009617
2065
2066$ ./faas-cli list Function Invocations Replicas
2067
2068Deploying and invoking functions
2069
2070You can easily deploy functions on your OpenFaaS platform using the CLI, with this command: faas-cli up.
2071
2072Let’s try out some sample functions from the OpenFaaS repository:
2073
2074./faas-cli deploy -f https://raw.githubusercontent.com/openfaas/faas/master/stack.yml
2075
2076Running a faas-cli list command now will show the deployed functions:
2077
2078$ ./faas-cli list
2079Function Invocations Replicas
2080base64 0 1
2081echoit 0 1
2082hubstats 0 1
2083markdown 0 1
2084nodeinfo 0 1
2085wordcount 0 1
2086
2087As an example, let’s invoke wordcount (a function that takes the syntax of the unix wc command, giving us the number of lines, words and characters of the input data):
2088
2089echo 'I love when a plan comes together' | ./faas-cli invoke wordcount
2090
2091
2092$ echo 'I love when a plan comes together' | ./faas-cli invoke wordcount
2093 1 7 34
2094
2095Invoking a function without the CLI
2096
2097You can use the faas-cli describe command to get the public URL of your function, and then call it directly with your favorite HTTP library (or the good old curl):
2098
2099$ ./faas-cli describe wordcount
2100Name: wordcount
2101Status: Ready
2102Replicas: 1
2103Available replicas: 1
2104Invocations: 1
2105Image: functions/alpine:latest
2106Function process:
2107URL: http://xxxxx657xx.lb.c1.gra.k8s.ovh.net:8080/function/wordcount
2108Async URL: http://xxxxx657xx.lb.c1.gra.k8s.ovh.net:8080/async-function/wordcount
2109Labels: faas_function : wordcount
2110Annotations: prometheus.io.scrape : false
2111
2112$ curl -X POST --data-binary "I love when a plan comes together" "http://xxxxx657xx.lb.c1.gra.k8s.ovh.net:8080/function/wordcount"
2113 0 7 33
2114
2115Containers everywhere…
2116
2117The most attractive part of a FaaS platform is being able to deploy your own functions.
2118In OpenFaaS, you can write your these function in many languages, not just the usual suspects (JavaScript, Python, Go etc.). This is because in OpenFaaS, you can deploy basically any container as a function, although this does mean you need to package your functions as containers in order to deploy them.
2119
2120That also means that in order to create your own functions, you need to have Docker installed on your workstation, and you will need to push the images in a Docker registry (either the official one or a private one).
2121
2122If you need a private registry, you can install one on your OVH Managed Kubernetes cluster. For this tutorial we are choosing to deploy our image on the official Docker registry.
2123Writing our first function
2124
2125For our first example, we are going to create and deploy a hello word function in JavaScript, using NodeJS. Let’s begin by creating and scaffolding the function folder:
2126
2127mkdir hello-js-project
2128cd hello-js-project
2129../faas-cli new hello-js --lang node
2130
2131The CLI will download a JS function template from OpenFaaS repository, generate a function description file (hello-js.yml in this case) and a folder for the function source code (hello-js). For NodeJS, you will find a package.json (to declare eventual dependencies to your function, for example) and a handler.js (the function main code) in this folder.
2132
2133Edit hello-js.yml to set the name of the image you want to upload to the Docker registry:
2134
2135hello-js.yaml
2136version: 1.0
2137provider:
2138 name: openfaas
2139 gateway: http://6d6rt657vc.lb.c1.gra.k8s.ovh.net:8080
2140functions:
2141 hello-js:
2142 lang: node
2143 handler: ./hello-js
2144 image: ovhplatform/openfaas-hello-js:latest
2145
2146The function described in the handler.js file is really simple. It exports a function with two parameters: a context where you will receive the request data, and a callback that you will call at the end of your function and where you will pass the response data.
2147
2148handler.js
2149"use strict"
2150
2151module.exports = (context, callback) => {
2152 callback(undefined, {status: "done"});
2153}
2154
2155Let’s edit it to send back our hello world message:
2156
2157handler.js
2158"use strict"
2159
2160module.exports = (context, callback) => {
2161 callback(undefined, {message: 'Hello world'});
2162}
2163
2164Now you can build the Docker image and push it to the public Docker registry:
2165
2166# Build the image
2167../faas-cli build -f hello-js.yml
2168# Login at Docker Registry, needed to push the image
2169docker login
2170# Push the image to the registry
2171../faas-cli push -f hello-js.yml
2172
2173With the image in the registry, let’s deploy and invoke the function with the OpenFaaS CLI:
2174
2175# Deploy the function
2176../faas-cli deploy -f hello-js.yml
2177# Invoke the function
2178../faas-cli invoke hello-js
2179
2180Congratulations! You have just written and deployed your first OpenFaaS function.
2181Using the OpenFaaS UI Portal
2182
2183You can test the UI Portal by pointing your browser to your OpenFaaS gateway URL (the one you have set on the $OPENFAAS_URL variable), and entering the admin user and the password you have set on the $PASSWORD variable when prompted to.
2184OpenFaaS UI Portal
2185
2186In the UI Portal, you will find the list of the deployed functions. For each function, you can find its description, invoke it and see the result.
2187OpenFaaS UI Portal
2188OpenFaaS UI Portal
2189Where do we go from here?
2190
2191So you now have a working OpenFaaS platform on your OVH Managed Kubernetes cluster.
2192
2193To learn more about OpenFaaS, and how you can get the most out of it, please refer to the official OpenFaaS documentation. You can also follow the OpenFaaS workshops for more practical tips and advice.
2194<|endoftext|>
2195<|startoftext|>
2196
2197Intel vulnerabilities
2198By OVH / 2019-05-15 / OVH News, Security
2199
2200Like all the players from the IT sector, OVH has been informed on May 14, 2019 of security vulnerabilities following the discovery of hardware vulnerabilities on Intel processors.
2201
2202These new vulnerabilities are similar to previous spectrum and meltdown vulnerabilities and affect Intel’s microprocessors, which are part of the components used by OVH.
2203Intel at OVH
2204
2205Researchers have shown proof of concept attacks under the names RIDL, Fallout and ZombieLoad, which exploit the following attack vectors :
2206
2207 CVE-2018-12126 [microarchitectural store buffer data sampling (MSBDS)]
2208 CVE-2018-12130 [microarchitectural fill buffer data sampling (MFBDS)]
2209 CVE-2018-12127 [microarchitectural load port data sampling (MLPDS)]
2210 CVE-2019-11091 [microarchitectural data sampling uncacheable memory (MDSUM)]
2211
2212Without the intervention of OVH or its customers, these vulnerabilities could allow an experienced attacker to conduct a complex attack. If it were completed, this would potentially allow for access to some data hosted on our multi-tenant infrastructures. At this point of time, OVH has not received any information demonstrating that the relevant vulnerabilities have been exploited on its infrastructure. Building a trustable cloud involves great responsibilities and the data security of its customers has always been paramount for OVH. As soon as this information reached us, OVH immediately mobilized its crisis unit to identify the potential impacts of these flaws and to set up the most suitable framework to protect the data of its customers. Since some of the fixes for these vulnerabilities require a firmware update of Intel’s CPUs, we are in close contact with their team to ensure optimal microcode deployment. In order to complete the mitigation of these vulnerabilities, we invite our customers to update the operating system of their server. You can find information on the most common OSs below:
2213
2214 Windows : https://support.microsoft.com/en-us/help/4072698/windows-server-speculative-execution-side-channel-vulnerabilities-prot
2215 Linux Red Hat : https://access.redhat.com/security/vulnerabilities/mds
2216 Ubuntu : https://blog.ubuntu.com/2019/05/14/ubuntu-updates-to-mitigate-new-microarchitectural-data-sampling-mds-vulnerabilities
2217
2218 We will keep you informed as soon as possible on the action plan and the schedule of the associated update operations. To this end, do not hesitate to consult regularly:
2219
2220 OVH Blog
2221 OVH Tasks
2222
2223<|endoftext|>
2224<|startoftext|>
2225
2226Alerting based on IPMI data collection
2227By Morvan Le Goff / 2019-05-10 / Alerting, Data Collection, IPMI, Observability
2228The problem to solve…
2229
2230How to continuously monitor the health of all OVH servers, without any impact on their performance, and no intrusion on the operating systems running on them – this was the issue to address. The end goal of this data collection is to allow us to detect and forecast potential hardware failure, in order to improve the quality of service delivered to our customers.
2231
2232We began by splitting the problem into four general steps:
2233
2234 Data collection
2235 Data storage
2236 Data analytics
2237 Visualisation/actions
2238
2239Data collection
2240
2241How did we collect massive amounts of server health data, in a non-intrusive way, within short time intervals?
2242Which data to collect?
2243
2244On modern servers, a BMC (Board Management Controller) allows us to control the firmware updates, reboots, etc.. This controller is independent of the system running on the server. In addition, the BMC gives us access to sensors for all the motherboard components through an I2C bus. The protocol used to communicate with the BMC is the IPMI protocol, which accessible via LAN (RMCP).
2245What is IPMI?
2246
2247 Intelligent Platform Management Interface.
2248 Management and monitoring capabilities independently of the host’s OS.
2249 Led by INTEL, first published in 1998.
2250 Supported by more than 200 computer system vendors such as Cisco, DELL, HP, Intel, SuperMicro…
2251
2252Why use IPMI?
2253
2254 Access to hardware sensors (cpu temp, memory temp, chassis status, power, etc.).
2255 No dependency on the OS (i.e. an agentless solution)
2256 IPMI functions accessible after OS/system failure
2257 Restricted access to IPMI functionalities via user privileges
2258
2259IPMI-poller node
2260Multi-source data collection
2261
2262We needed a scalable and responsive multi-source data collection tool to grab the IPMI data of about 400k servers at fixed intervals.
2263Akka
2264
2265We decided to build our IPMI data collector on an Akka framework. Akka is a open-source toolkit and runtime, simplifying the construction of concurrent and distributed applications on the JVM.
2266
2267The Akka framework defines an abstraction built above thread called ‘actor’. This actor is an entity that handles messages. This abstraction eases the creation of multi-thread applications, so there’s no need to fight against deadlock. By selecting the dispatcher policy for a group of actors, you can fine-tune your application to be fully reactive and adaptable to the load. This way, we were able to design an efficient data collector that could adapt to the load, as we intended to grab each sensor value every minute.
2268
2269In addition, the cluster architecture provided by the framework allowed us to handle all the servers in a datacentre with a single cluster. The cluster architecture also helped us to design a resilient system, so if a node of the cluster crashes or becomes too slow, it will automatically restart. The servers monitored by the failing node are then handled by the remaining, valid nodes of the cluster.
2270
2271With the cluster architecture, we implemented a quorum feature, to take down the whole cluster if the minimal number of started nodes is not reached. With this feature, we can easily solve the split-brain problem, as if the connection is broken between nodes, the cluster will be split into two entities, and the one that does not reached the quorum will be automatically shut down.
2272
2273A REST API is defined to communicate with the data collector in two ways:
2274
2275 To send the configurations
2276 To get information on the monitored servers
2277
2278A cluster node is running on one JVM, and we are able to launch one or more nodes on a dedicated server. Each dedicated server used in the cluster is put in an OVH VRACK. An IPMI gateway pool is used to access the BMC of each server, with the communication between the gateway and the IPMI data collector secured by IPSEC connections.
2279IPMI-poller clustering
2280Data storage
2281OVH Metrics
2282
2283Of course, we use the OVH Metrics service for data storage! Before storing the data, the IPMI data collector unifies the metrics, by qualifying each sensor. The final metric name is defined by the entity the sensor belongs to and the base unit of the value. This will ease the post-treatment processes and data visualisation/comparison.
2284
2285Each datacentre IPMI collector pushes its data to a Metrics live cache server with a limited persistence time. All important information is persisted in the OVH Metrics server.
2286Data analytics
2287Warp 10
2288
2289We store ours metrics in warp10. Warp 10 comes with a Time series scripting language: WarpScript which wakes the analytics powerful to easily manipulate and post-process (on the server side) our collected data.
2290
2291We have defined three levels of analysis to monitor the health of the servers:
2292
2293 A simple threshold-per-server metric.
2294 By using OVH metric loops service, we aggregate data per rack and per room and calculate a mean. We set a threshold for this mean, this permits to detect racks or room common failure in the cooling or power supply system.
2295 The OVH MLS service performs some anomaly detections on the racks and rooms by forecasting the possible evolution of metrics, depending on past values. If the metrics value is outside of this template, an anomaly is raised.
2296
2297Visualisation/actions
2298TAT
2299
2300All the alerts generated by the data analysis are pushed under TAT, which is an OVH tool we use to handle the alerting flow.
2301Grafana
2302
2303Grafana is used to monitored the metrics. We have dashboards to visualise the metrics and the aggregations for each rack and room, the detected anomalies, and the evolution of the opened alerts.
2304<|endoftext|>
2305<|startoftext|>
2306
2307OVH Private Cloud and HashiCorp Terraform – Part 1
2308By Erwan Quelin / 2019-05-03 / Devops, HashiCorp, IaaC, Private Cloud, Terraform, VMware, vSphere
2309
2310When discussing the concepts of DevOps and Infrastructure-as-a-Code, the tools developed by HashiCorp quickly come up. With Terraform, HashiCorp offers a simple way to automate infrastructure provisioning in both public clouds and on-premises. Terraform has a long history of deploying and managing OVH’s Public Cloud resources. For example, you can find a complete guide on GitHub. In this article, we will focus on using Terraform to interact with another OVH solution: Private Cloud.
2311
2312
2313Private Cloud enables customers to benefit from a VMware vSphere infrastructure, hosted and managed by OVH. Terraform lets you automate the creation of resources and their life cycle. In this first article, we will explore the basic notions of Terraform. After reading it, you should be able to write a Terraform configuration file to deploy and customise a virtual machine from a template. In a second article, we will build on this example, and modify it so that it is more generic and can be easily adapted to your needs.
2314Installation
2315
2316Terraform is available on the HashiCorp website for almost all OSs as a simple binary. Just download it and copy it into a directory in your operating system PATH. To test that everything is working properly, run the terraform command.
2317
2318$ terraform
2319Usage: terraform [-version] [-help] <command> [args]
2320
2321The available commands for execution are listed below.
2322The most common, useful commands are shown first, followed by
2323less common or more advanced commands. If you're just getting
2324started with Terraform, stick with the common commands. For the
2325other commands, please read the help and docs before usage.
2326
2327Common commands:
2328 apply Builds or changes infrastructure
2329 console Interactive console for Terraform interpolations
2330 destroy Destroy Terraform-managed infrastructure
2331
2332Folders and files
2333
2334Like other Infrastructure-as- a-Code tools, Terraform uses simple files to define the target configuration. To begin, we will create a directory and place a file named main.tf. By default, Terraform will read all the files in the working directory with the .tf extension, but to simplify things, we will start with a single file. We will see in a future article how to organise the data into several files.
2335
2336Similarly, to make it easier to understand Terraform operations, we will specify all the necessary information directly in the files. This includes usernames, passwords and names of different resources (vCenter, cluster, etc.). It is obviously not advisable to do this in order to use Terraform in production. The second article will also be an opportunity to improve this part of the code. But for now, let’s keep it simple!
2337Providers
2338
2339The providers let you specify how Terraform will communicate with the outside world. In our example, the vSphere provider will be in charge of connecting with your Private Cloud’s vCenter. We declare a provider as follows:
2340
2341provider "vsphere" {
2342 user = "admin"
2343 password = "MyAwesomePassword"
2344 vsphere_server = "pcc-XXX-XXX-XXX-XXX.ovh.com"
2345}
2346
2347We see here that Terraform uses its own way of structuring data (it is also possible to write everything in JSON to facilitate the automatic generation of files! ). Data is grouped in blocks (here a block named vsphere, which is of the provider type) and the data relating to the block are in the form of keys/values.
2348Data
2349
2350Now that Terraform is able to connect to the vCenter, we need to retrieve information about the vSphere infrastructure. Since we want to deploy a virtual machine, we need to know the datacentre, cluster, template, etc., and where we are going to create it. To do this, we will use data-type blocks:
2351
2352data "vsphere_datacenter" "dc" {
2353 name = "pcc-XXX-XXX-XXX-XXX_datacenter3113"
2354}
2355
2356data "vsphere_datastore" "datastore" {
2357 name = "pcc-001234"
2358 datacenter_id = "${data.vsphere_datacenter.dc.id}"
2359}
2360
2361data "vsphere_virtual_machine" "template" {
2362 name = "UBUNTU"
2363 datacenter_id = "${data.vsphere_datacenter.dc.id}"
2364}
2365
2366In the above example, we are trying to get information about the datacentre named pcc-XXX-XXX-XXX-XXX_datacenter3113 and get the information from the datastore named pcc-001234 and a template whose name is UBUNTU. We see here that we use the datacentre id to get information about an object associated with it.
2367Resources
2368
2369The resources will be used to create and/or manage elements of the infrastructure. In our example, we will use a resource of type virtual_machine, which as its name suggests, will help us to create a VM.
2370
2371resource "vsphere_virtual_machine" "vm" {
2372 name = "vm01"
2373 resource_pool_id = "${data.vsphere_compute_cluster.cluster.resource_pool_id}"
2374 datastore_id = "${data.vsphere_datastore.datastore.id}"
2375 guest_id = "${data.vsphere_virtual_machine.template.guest_id}"
2376 scsi_type = "${data.vsphere_virtual_machine.template.scsi_type}"
2377
2378 network_interface {
2379 network_id = "${data.vsphere_network.network.id}"
2380 }
2381
2382 disk {
2383 label = "disk0"
2384 size = "${data.vsphere_virtual_machine.template.disks.0.size}"
2385 }
2386
2387 clone {
2388 template_uuid = "${data.vsphere_virtual_machine.template.id}"
2389
2390 customize {
2391 linux_options {
2392 host_name = "vm01"
2393 domain = "example.com"
2394 }
2395
2396 network_interface {
2397 ipv4_address = "192.168.1.2"
2398 ipv4_netmask = 24
2399 }
2400
2401 ipv4_gateway = "192.168.1.254"
2402 dns_suffix_list = ["example.com"]
2403 dns_server_list = ["192.168.1.1"]
2404 }
2405 }
2406}
2407
2408
2409The structure of this resource is a little more complex, because it is composed of several sub-blocks. We see that we will first define the name of the virtual machine. We then provide information about its configuration (Resource pool, datastore, etc.). The network_interface and disk blocks are used to specify the configuration of its virtual devices. The clone sub-block will let you specify which template you wish to use to create the VM, and also to specify the configuration information of the operating system installed on the VM. The customize sub-block is specific to the type of OS you want to clone. At all levels, we use information previously obtained in the data blocks.
2410Full example
2411
2412provider "vsphere" {
2413 user = "admin"
2414 password = "MyAwesomePassword"
2415 vsphere_server = "pcc-XXX-XXX-XXX-XXX.ovh.com"
2416}
2417
2418data "vsphere_datacenter" "dc" {
2419 name = "pcc-XXX-XXX-XXX-XXX_datacenter3113"
2420}
2421
2422data "vsphere_datastore" "datastore" {
2423 name = "pcc-001234"
2424 datacenter_id = "${data.vsphere_datacenter.dc.id}"
2425}
2426
2427data "vsphere_compute_cluster" "cluster" {
2428 name = "Cluster1"
2429 datacenter_id = "${data.vsphere_datacenter.dc.id}"
2430}
2431
2432data "vsphere_network" "network" {
2433 name = "vxw-dvs-57-virtualwire-2-sid-5001-Dc3113_5001"
2434 datacenter_id = "${data.vsphere_datacenter.dc.id}"
2435}
2436
2437data "vsphere_virtual_machine" "template" {
2438 name = "UBUNTU"
2439 datacenter_id = "${data.vsphere_datacenter.dc.id}"
2440}
2441
2442resource "vsphere_virtual_machine" "vm" {
2443 name = "vm01"
2444 resource_pool_id = "${data.vsphere_compute_cluster.cluster.resource_pool_id}"
2445 datastore_id = "${data.vsphere_datastore.datastore.id}"
2446 guest_id = "${data.vsphere_virtual_machine.template.guest_id}"
2447 scsi_type = "${data.vsphere_virtual_machine.template.scsi_type}"
2448
2449 network_interface {
2450 network_id = "${data.vsphere_network.network.id}"
2451 }
2452
2453 disk {
2454 label = "disk0"
2455 size = "${data.vsphere_virtual_machine.template.disks.0.size}"
2456 }
2457
2458 clone {
2459 template_uuid = "${data.vsphere_virtual_machine.template.id}"
2460
2461 customize {
2462 linux_options {
2463 host_name = "vm01"
2464 domain = "example.com"
2465 }
2466
2467 network_interface {
2468 ipv4_address = "192.168.1.2"
2469 ipv4_netmask = 24
2470 }
2471
2472 ipv4_gateway = "192.168.1.254"
2473 dns_suffix_list = ["example.com"]
2474 dns_server_list = ["192.168.1.1"]
2475 }
2476 }
2477}
2478
24793… 2… 1… Ignition
2480
2481Let’s look at how to use our new config file with Terraform…
2482OVH Private Cloud and HashiCorp Terraform
2483Initialisation
2484
2485Now that our configuration file is ready, we will be able to use it to create our virtual machine. Let’s start by initialising the working environment with the terraform init command. This will take care of downloading the vSphere provider and create the different files that Terraform needs to work.
2486
2487$ terraform init
2488
2489Initializing provider plugins...
2490- Checking for available provider plugins on https://releases.hashicorp.com...
2491- Downloading plugin for provider "vsphere" (1.10.0)...
2492
2493The following providers do not have any version constraints in configuration,
2494so the latest version was installed.
2495
2496...
2497
2498* provider.vsphere: version = "~> 1.10"
2499
2500Terraform has been successfully initialized!
2501...
2502
2503Plan
2504
2505The next step is to execute the terraform plan command to validate that our configuration file contains no errors and to visualise all the actions that Terraform will perform.
2506
2507$ terraform plan
2508Refreshing Terraform state in-memory prior to plan...
2509The refreshed state will be used to calculate this plan, but will not be
2510persisted to local or remote state storage.
2511
2512data.vsphere_datacenter.dc: Refreshing state...
2513data.vsphere_compute_cluster.cluster: Refreshing state...
2514data.vsphere_network.network: Refreshing state...
2515data.vsphere_datastore.datastore: Refreshing state...
2516data.vsphere_virtual_machine.template: Refreshing state...
2517
2518------------------------------------------------------------------------
2519
2520An execution plan has been generated and is shown below.
2521Resource actions are indicated with the following symbols:
2522 + create
2523
2524Terraform will perform the following actions:
2525
2526 + vsphere_virtual_machine.vm
2527 id: <computed>
2528 boot_retry_delay: "10000"
2529 change_version: <computed>
2530 clone.#: "1"
2531 clone.0.customize.#: "1"
2532 clone.0.customize.0.dns_server_list.#: "1"
2533 clone.0.customize.0.dns_server_list.0: "192.168.1.1"
2534 clone.0.customize.0.dns_suffix_list.#: "1"
2535 clone.0.customize.0.dns_suffix_list.0: "example.com"
2536 clone.0.customize.0.ipv4_gateway: "172.16.0.1"
2537 clone.0.customize.0.linux_options.#: "1"
2538 clone.0.customize.0.linux_options.0.domain: "example.com"
2539 clone.0.customize.0.linux_options.0.host_name: "vm01"
2540 clone.0.customize.0.linux_options.0.hw_clock_utc: "true"
2541 clone.0.customize.0.network_interface.#: "1"
2542 clone.0.customize.0.network_interface.0.ipv4_address: "192.168.1.2"
2543 clone.0.customize.0.network_interface.0.ipv4_netmask: "16"
2544 clone.0.customize.0.timeout: "10"
2545 clone.0.template_uuid: "42061bc5-fdec-03f3-67fd-b709ec06c7f2"
2546 clone.0.timeout: "30"
2547 cpu_limit: "-1"
2548 cpu_share_count: <computed>
2549 cpu_share_level: "normal"
2550 datastore_id: "datastore-93"
2551 default_ip_address: <computed>
2552 disk.#: "1"
2553 disk.0.attach: "false"
2554 disk.0.datastore_id: "<computed>"
2555 disk.0.device_address: <computed>
2556 ...
2557
2558Plan: 1 to add, 0 to change, 0 to destroy.
2559
2560
2561It is important to take time to check all information returned by the plan command before proceeding. It would be a mess to delete virtual machines in production due to an error in the configuration file… In the example below, we see that Terraform will create a new resource (here a VM) and not modify or delete anything, which is exactly the goal!
2562Apply
2563
2564In the last step, the terraform apply command will actually configure the infrastructure according to the information present in the configuration file. As a first step, the plan command will be executed, and Terraform will ask you to validate by typing yes.
2565
2566$ terraform apply
2567...
2568
2569Plan: 1 to add, 0 to change, 0 to destroy.
2570
2571Do you want to perform these actions?
2572 Terraform will perform the actions described above.
2573 Only 'yes' will be accepted to approve.
2574
2575 Enter a value: yes
2576
2577vsphere_virtual_machine.vm: Creating...
2578 boot_retry_delay: "" => "10000"
2579 change_version: "" => "<computed>"
2580 clone.#: "" => "1"
2581 clone.0.customize.#: "" => "1"
2582 clone.0.customize.0.dns_server_list.#: "" => "1"
2583 clone.0.customize.0.dns_server_list.0: "" => "192.168.1.1"
2584 clone.0.customize.0.dns_suffix_list.#: "" => "1"
2585 clone.0.customize.0.dns_suffix_list.0: "" => "example.com"
2586 clone.0.customize.0.ipv4_gateway: "" => "192.168.1.254"
2587 clone.0.customize.0.linux_options.#: "" => "1"
2588 clone.0.customize.0.linux_options.0.domain: "" => "example.com"
2589 clone.0.customize.0.linux_options.0.host_name: "" => "terraform-test"
2590 clone.0.customize.0.linux_options.0.hw_clock_utc: "" => "true"
2591 clone.0.customize.0.network_interface.#: "" => "1"
2592 clone.0.customize.0.network_interface.0.ipv4_address: "" => "192.168.1.2"
2593 clone.0.customize.0.network_interface.0.ipv4_netmask: "" => "16"
2594 clone.0.customize.0.timeout: "" => "10"
2595 clone.0.template_uuid: "" => "42061bc5-fdec-03f3-67fd-b709ec06c7f2"
2596 clone.0.timeout: "" => "30"
2597 cpu_limit: "" => "-1"
2598 cpu_share_count: "" => "<computed>"
2599 cpu_share_level: "" => "normal"
2600 datastore_id: "" => "datastore-93"
2601 default_ip_address: "" => "<computed>"
2602 disk.#: "" => "1"
2603...
2604vsphere_virtual_machine.vm: Still creating... (10s elapsed)
2605vsphere_virtual_machine.vm: Still creating... (20s elapsed)
2606vsphere_virtual_machine.vm: Still creating... (30s elapsed)
2607...
2608vsphere_virtual_machine.vm: Still creating... (1m50s elapsed)
2609vsphere_virtual_machine.vm: Creation complete after 1m55s (ID: 42068313-d169-03ff-9c55-a23e66a44b48)
2610
2611Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
2612
2613When you connect to the vCenter of your Private Cloud, you should see a new virtual machine in the inventory!
2614Next steps
2615
2616Now that we have seen a standard Terraform workflow, you may want to test some modifications to your configuration file. For example, you can add another virtual disk to your VM by modifying the virtual_machine resource’s block like this:
2617
2618disk {
2619 label = "disk0"
2620 size = "${data.vsphere_virtual_machine.template.disks.0.size}"
2621}
2622
2623disk {
2624 label = "disk1"
2625 size = "${data.vsphere_virtual_machine.template.disks.0.size}"
2626 unit_number = 1
2627}
2628
2629Then run terraform plan to see what Terraform is going to do to in order to reconcile the infrastructure state with your configuration file.
2630
2631$ terraform plan
2632Refreshing Terraform state in-memory prior to plan...
2633The refreshed state will be used to calculate this plan, but will not be
2634persisted to local or remote state storage.
2635
2636data.vsphere_datacenter.dc: Refreshing state...
2637data.vsphere_datastore.datastore: Refreshing state...
2638data.vsphere_network.network: Refreshing state...
2639data.vsphere_compute_cluster.cluster: Refreshing state...
2640data.vsphere_virtual_machine.template: Refreshing state...
2641vsphere_virtual_machine.vm: Refreshing state... (ID: 4206be6f-f462-c424-d386-7bd0a0d2cfae)
2642
2643------------------------------------------------------------------------
2644
2645An execution plan has been generated and is shown below.
2646Resource actions are indicated with the following symbols:
2647 ~ update in-place
2648
2649Terraform will perform the following actions:
2650
2651 ~ vsphere_virtual_machine.vm
2652 disk.#: "1" => "2"
2653 disk.1.attach: "" => "false"
2654 disk.1.datastore_id: "" => "<computed>"
2655 ...
2656
2657
2658Plan: 0 to add, 1 to change, 0 to destroy.
2659
2660If you agree with terraform action’s proposal, you can rerun terraform apply, to add a new virtual disk to your virtual machine.
2661Clean it up
2662
2663When you have finished your tests and you no longer require the utility of the infrastructure, you can simply run the terraform destroy command to delete all previously-created resources. Be careful with this command, as there is no way to get your data back after that!
2664
2665$ terraform destroy
2666
2667data.vsphere_datacenter.dc: Refreshing state...
2668data.vsphere_compute_cluster.cluster: Refreshing state...
2669data.vsphere_datastore.datastore: Refreshing state...
2670data.vsphere_network.network: Refreshing state...
2671data.vsphere_virtual_machine.template: Refreshing state...
2672vsphere_virtual_machine.vm: Refreshing state... (ID: 42068313-d169-03ff-9c55-a23e66a44b48)
2673
2674An execution plan has been generated and is shown below.
2675Resource actions are indicated with the following symbols:
2676 - destroy
2677
2678Terraform will perform the following actions:
2679
2680 - vsphere_virtual_machine.vm
2681
2682
2683Plan: 0 to add, 0 to change, 1 to destroy.
2684
2685Do you really want to destroy all resources?
2686 Terraform will destroy all your managed infrastructure, as shown above.
2687 There is no undo. Only 'yes' will be accepted to confirm.
2688
2689 Enter a value: yes
2690
2691vsphere_virtual_machine.vm: Destroying... (ID: 42068313-d169-03ff-9c55-a23e66a44b48)
2692vsphere_virtual_machine.vm: Destruction complete after 3s
2693
2694Destroy complete! Resources: 1 destroyed.
2695
2696In this article, we have seen how to deploy a virtual machine with a Terraform configuration file. This allowed us to learn the basic commands plan, apply and destroy, as well as the notions of provider, data and resource. In the next article, we will develop this example, by modifying it to make it more adaptable and generic.
2697<|endoftext|>
2698<|startoftext|>
2699
2700Prescience: Introducing OVH’s Machine Learning Platform
2701By Christophe Rannou / 2019-04-26 / Innovation, Machine learning, OVH Labs, Prescience
2702
2703Machine Learning projects are becoming an increasingly important component of today’s search for more efficient and complex industrial processes. OVH Prescience is a Machine Learning platform that aims to ease the conception, deployment and serving of models in an industrial context. The system manages Machine Learning pipelines, from data ingestion to model monitoring. This involves the automation of data preprocessing, model selection, evaluation, and deployment within a scalable platform.
2704Machine Learning with OVH Prescience
2705
2706Prescience supports various kinds of problems, such as regression, classification, time series forecasting, and soon, anomaly detection. Problem resolution is handled through the use of both traditional ML models and neural networks.
2707
2708Prescience is currently used at production scale to solve various challenges faced by OVH, and its alpha version is available to explore for free at OVH Labs. In this blog post, we will introduce Prescience, and walk you through a typical workflow along with its components. An in-depth presentation of the components will be available in future blog posts.
2709The inception of Prescience
2710
2711At some point, all Machine Learning projects face the same challenge: how to bridge the gap between a prototype Machine Learning system, and its use in a production context. That was the cornerstone of the development of a Machine Learning platform within OVH.
2712Production notebooks
2713
2714More often than not, data scientists design Machine Learning systems that include data processing and model selection within notebooks. If they’re successful, these notebooks are then adapted for production needs by data engineers or developers. This process is usually delicate. It is time-consuming, and must be repeated each time the model or the data processing requires an update. These issues lead to production models that, though ideal when delivered, might drift from the actual problem over time. In practice, it is common for models to never be used in a production capacity, in spite of their quality, just because the data pipeline is too complicated (or boring) to take out of the notebook. As a result, all the data scientists’ work goes to waste.
2715
2716In light of this, the first problem Prescience needed to solve was how to provide a simple way to deploy and serve models, while allowing monitoring and efficient model management, including (but not limited to) model retraining, model evaluation or model querying through a serving REST API.
2717Enhanced prototyping
2718
2719Once the gap between prototyping and production was bridged, the second objective was to shorten the prototyping phase of Machine Learning projects. The base observation is that data scientists’ skills are most crucial when applied to data preparation or feature engineering. Essentially, the data scientist’s job is to properly define the problem. This includes characterising the data, the actual target, and the correct metric for evaluation. Nonetheless, model selection is also a task handled by the data scientist – one which delivers a lot less value from this specialist. Indeed, one of the classics way of finding a good model and its parameters still is to brute-force all possible configurations within a given space. As a result, model selection can be quite painstaking and time-consuming.
2720
2721Consequently, Prescience needed to provide data scientists with an efficient way to test and evaluate algorithms, which would allow them to focus on adding value to the data and problem definition. This was achieved by adding an optimisation component that, given a configuration space, evaluates and tests the configurations within it, regardless of whether they’ve been tweaked by the data scientist. The architecture being scalable, we can quickly test a significant number of possibilities in this way. The optimisation component also leverages techniques to try and outperform the brute-force approach, through the use of Bayesian optimisation. In addition, tested configurations for a given problem are preserved for later use, and to ease the start of the optimisation process.
2722Widening the possibilities
2723
2724In a company such as OVH, a lot of concerns can be addressed with Machine Learning techniques. Unfortunately, it is not possible to assign a data scientist to each of these issues, especially if it has not been established whether the investment would be worthwhile. Even though our business specialists have not mastered all Machine Learning techniques, they have an extensive knowledge of the data. Building on this knowledge, they can provide us with a minimal definition of the problem at hand. Automating the previous steps (data preparation and model selection) enables specialists to swiftly evaluate the possible benefits of a Machine Learning approach. It is then possible to adopt a quick-win/quick-fail process for potential projects. If this is successful, we can bring a data scientist into the loop, if necessary.
2725
2726Prescience also incorporates automated pipeline management, to adapt the raw data to be consumed by Machine Learning algorithms (i.e. preprocessing), then select a well-suited algorithm and its parameters (i.e. model selection), while retaining automatic deployment and monitoring.
2727Prescience architecture and Machine Learning workflows
2728
2729Essentially, the Prescience platform is built upon open-source technologies, such as Kubernetes for operations, Spark for data processing, and Scikit-learn, XGBoost, Spark MLlib and Tensorflow for Machine Learning libraries. Most of Prescience’s development involved linking these technologies together. In addition, all intermediate outputs of the system – such as pre-processed data, transformation steps, or models – are serialised using open-source technologies and standards. This prevents users from being tethered to Prescience, in case it ever becomes necessary to use another system.
2730
2731User interaction with the Prescience platform is made possible through the following elements:
2732
2733 user interface
2734 python client
2735 REST API
2736
2737Let’s take a look at a typical workflow, and give a brief description of the different components…
2738Data ingestion
2739
2740The first step of a Machine Learning workflow is to ingest user data. We currently support three types of source, which will then be extended, depending on usage:
2741
2742 CSV, the industry standard
2743 Parquet, which is pretty cool (plus auto-documented and compressed)
2744 Time-Series, thanks to OVH Observability, powered by Warp10
2745
2746The raw data provided by each of these sources is rarely usable as-is by Machine Learning algorithms. Algorithms generally expect numbers to work with. The first step of the workflow is therefore performed by the Parser component. The Parser’s only job is to detect types and column names, in the case of plain text formats, such as CSV, although Parquet and Warp10 sources include a schema, making this step moot. Once the data is typed, the Parser extracts statistics, in order to precisely characterise it. The resulting data, along with its statistics, is stored in our object storage backend – Public Cloud Storage, powered by OpenStack Swift.
2747Data transformation
2748
2749Once the types are inferred and the statistics extracted, the data still usually needs to be processed before it’s Machine Learning-ready. This step is handled by the preprocessor. Relying on the computed statistics and problem type, it infers the best strategy to apply to the source. For instance, if we have a single category, then a one-hot-encoding is performed. However if we have a a large number of different categories, then a more suited processing type is selected, such as level/impact coding. After being inferred, the strategy is applied to the source, transforming it into a dataset, which will be the basis of the subsequent model selection step.
2750
2751The preprocessor not only outputs the dataset, but also a serialised version of the transformations. The chosen format for the serialisation is the PMML (Predictive Model Markup Language). This is a description standard to share and exchange data mining and Machine Learning algorithms. Using this format, we will then be able to apply the exact same transformation at serving time, when confronted with new data.
2752Model selection
2753
2754Once a dataset is ready, the next step is to try and fit the best model. Depending on the problem, a set of algorithms, along with their configuration space, is provided to the user. Depending on their skill level, the user can tweak the configuration space and preselect a subset of algorithms that better fit the problem.
2755Bayesian optimisation
2756Bayesian optimization
2757
2758The component that handles the optimisation and the model selection is the optimisation engine. When starting an optimisation, a sub-component called the controller creates an internal optimisation task. The controller handles the scheduling of the various optimisation steps performed during the task. The optimisation is achieved using Bayesian methods. Basically, a Bayesian approach consists in learning a model that will be able to predict what is the best configuration. We can break down the steps as follows:
2759
2760 The model is in a cold state. The optimiser returns the default set of initial configurations to the controller
2761 The controller distributes the initial configurations over a cluster of learners
2762 Upon completion of the initial configurations, the controller stores the results
2763 The optimiser starts its second iteration, and trains a model on the available data
2764 Based on the resulting model, the optimiser outputs the best challengers to try. Both their potential efficiency, and the amount of information it will provide to improve the selection model are considered
2765 The controller distributes the new set of configurations over the cluster and waits for new information, a.k.a newly-evaluated configurations. Configurations are evaluated using a K-fold cross-validation, to avoid overfitting.
2766 When new information is available, a new optimisation iteration is started, and the process begins again at step 4
2767 After a predefined number of iterations, the optimisation stops
2768
2769Model validation
2770
2771Once optimisation is completed, the user can either launch a new optimisation, leveraging the existing data (hence not starting back to the cold state), or select a configuration according to its evaluation scores. Once a suitable configuration is reached, it is used to train the final model, which is then serialised in either a PMML format, or the Tensorflow saved model format. The same learners that handled the evaluations perform the actual training.
2772
2773Eventually, the final model is evaluated against a test set, extracted during preprocessing . This set is never used during model selection or training, to ensure that computed scoring metrics are unbiased. Based on the resulting scoring metrics, the decision can be made to use the model in production or not.
2774Model serving
2775
2776At this stage, the model is trained and exported and ready to be served. The last component of the Prescience platform is Prescience Serving. This is a web service that consumes PMML and saved models, and exposes a REST API on top. As transformations are exported alongside the model, the user can query the newly deployed model using the raw data. Predictions are now ready to be used within any application.
2777Model monitoring
2778
2779In addition, one of the characteristic features of Machine Learning is its ability to adapt itself to new data. Contrary to traditional, hardcoded business rules, the Machine Learning model is able to adapt to the underlying patterns. To do this, the Prescience platform enables users to easily update sources, refresh datasets, and retrain models. These lifecycle steps help maintain model relevance regarding the problem that needs solving. The user can then match its retraining frequency with newly-qualified data generation. They can even interrupt the training process in the event of an anomaly in the data generation pipeline. Each time a model is retrained, a new set of scores is computed, and stored in OVH Observability for monitoring.
2780
2781As we outlined at the beginning of this blog post, having an accurate model does not give any guarantees about its ability to maintain this accuracy over time. For numerous reasons, model performance can weaken. For example, the raw data can decrease in quality, some anomalies can appear in the data engineering pipelines, or the problem itself can drift, rendering the current model irrelevant, even after retraining. It is therefore essential to continuously monitor model performance throughout the entire lifecycle, to avoid making decisions based on an obsolete model.
2782The move towards an AI-driven company
2783
2784Prescience is currently used at OVH to solve several industrial problems, such as fraud prevention and predictive maintenance in datacentres.
2785
2786With this platform, we plan on empowering more and more teams and services at OVH with the ability to optimise their processes through Machine Learning. We are particularly excited about our work with Time Series, which has a decisive role in the operation and monitoring of hundreds of thousands of servers and virtual machines.
2787
2788The development of Prescience is conducted by the Machine Learning Services team. MLS is composed of four Machine Learning engineers: Mael, Adrien, Raphael, and myself. The team is supervised by Guillaume, who helped me design the platform. In addition, the team includes two data scientists, Olivier and Clement, who handled internal use cases and provided us with feedback, and finally, Laurent: a CIFRE student working on multi-objective optimisation in Prescience, in collaboration with the ORKAD Research Team.
2789<|endoftext|>
2790<|startoftext|>
2791
2792Web hosting — why we decided to migrate three million websites
2793By Vincent Cassé / 2019-04-19 / Infrastructure, Migration, Web Hosting
2794
2795Have you migrated a website before? If you have, or will need to migrate websites regularly, you’ll be familiar with the difficulties associated with this kind of operation.
2796
2797To put in the most basic terms, this operation usually involves six steps:
2798
2799 purchasing and configuring the destination infrastructure
2800 testing the new infrastructure by importing data and website code
2801 shutting down the website, or putting it in read-only mode to stop the data from being saved on the old infrastructure during the migration process
2802 deploying code on the new infrastructure, and importing data into the new databases
2803 modifying the source code to adapt it to the new infrastructure (database identifiers, connection to external APIs, etc.)
2804 once the website is working on the new infrastructure, redirecting traffic to make it available again (modifying DNS zones, updating the load balancer’s back-end, etc.)
2805
2806Depending on the website’s source code and how complex the infrastructure is, these steps may vary in complexity. Really, this depends on your website’s write access to data. Should data be stored in a database? Does your website have a cache based on local files? All of this is defined in the source code for your webpages.
2807Web hosting: migrating 3 million websites
2808
2809Just a year ago, in March 2018, OVH launched a major project: the migration of all web hosting and email clients hosted in its legacy Paris datacentre to a new datacentre. To organise the project, the migration process has been split into two parts, managed by two separate teams: web hosting for the first team, and email for the second. Today, we’ll be focusing on web hosting migration, but the email migration team will also discuss their process on our technical blog.
2810
2811For web hosting, this project involves migrating three million different websites, hosted on 7,000 servers in Paris. Some of these websites have been running since 1999! It is one of OVH’s longest-running activities. So why migrate them when they’ve been working fine in Paris for nearly 20 years? To understand all of the challenges we face, we’ll need to delve into the history of this service.
2812A brief history of our web hosting platform
2813
2814When Octave founded OVH in 1999, internet access was still limited in availability. The company’s first activity was website hosting. What seems simple now was not as simple at the time. You had to have good network connections, keep web servers running, and configure them properly. It was hard to find people with the technical knowledge or resources to do this.
2815P19 construction and expansion
2816
2817In the early 2000’s, OVH had the opportunity to acquire a building in the 19th arrondissement of Paris. The P19 building had good access to electricity and internet networks, so it could provide web and email hosting services to a high number of customers. For a while, it was OVH’s only datacentre.
2818
2819In P19, OVH didn’t just offer web hosting. The datacentre also hosted dedicated servers. Both activities quickly gained popularity, and in the late 2000’s, OVH began building many new datacentres in Roubaix, then Strasbourg, Beauharnois (Canada), Gravelines, and further afield.
2820
2821Every time we built a new datacentre, we gained more experience, which helped us improve logistics and maintenance. These new datacentres were much larger than our site in Paris, and gave us the space we needed to accelerate the development of many innovative technologies and solutions, like water-cooling, our own production line for servers and racks, and a cooling system for server rooms that didn’t involve air-conditioning.
2822How the web has developed between 1999 and now
2823
2824The internet has changed dramatically since 1999. From our point of view as a hosting provider, we have observed three developments over time…
2825
2826 1999 -> 2005: The birth of the web. Static websites were being set up in HTML. This was when blogs started to emerge. But this technology was only available to people who knew how to use HTML and FTP clients, even though FrontPage helped a lot of people get started.
2827 To work, these websites included data directly in the code. Web hosting was quite simple: the user needed a storage space and a web server, the sole purpose of which was to send the webpage, which it would search for in the storage space.
2828 2006 -> 2013: Web 2.0 — the social network and database revolution. Websites became dynamic, and could display custom pages, depending on the user. This was when the discussion forums, blog platforms, and social networks that are still so popular today first began to emerge.
2829 Dynamic websites were a revolution for web hosting providers; code and data were now stored in two separate locations. This meant that the page would need to be generated before it was sent to the end-user. The role of the web server changed, and would generate these pages on request, mainly with PHP language. Database servers needed to be added for these websites, as well as computing power for the web servers.
2830 2014 -> today: JavaScript has increased in power, helping developers build complex web applications in browsers, and significantly improving web users’ experience. This change has been made possible by the deployment of the internet on our smartphones. A large number of services that require web access could be launched as a result.
2831 Technically, this means that uses are changing, and users visit websites more often, thereby increasing the volume of data created and the complexity of how it is processed. The use of disk space and resources to generate web pages is continuously increasing.
2832
2833We have very quickly adapted our solutions to respond to these changes. We have offered new databases, increased our storage space, provided CDN services, and much more.
2834
2835But the rapid growth in the number of users and resources for our web hosting plans filled up our datacentre. Due to both the natural growth of the service and the growing needs of the websites we host, in 2015 we noticed that our Paris datacentre would be full by early 2017.
2836Web hosting deployment in Gravelines
2837
2838Once we noted this, there was only one solution: to avoid a shortage of web hosting plans, we need to host our websites in another datacentre. We industrialised our services in Paris to deliver hosting services 24/7, manage 7,000 servers, and keep them operational, based on OVH’s earliest technologies.
2839
2840We could have chosen to maintain this industrialisation and apply it to Gravelines, but we decided to do something else. We decided to build a new technical architecture that would support growing needs in terms of performance, and above all, allow us to re-use other OVH products. It’s the famous “eat your own dog food” approach, applied and expanded to the scale of our web hosting plans.
2841
2842We challenged our very own teams to manage the dedicated servers, vRack (private network), IP addresses, and load balancers (IPLB), so that they would be able to maintain our customers’ infrastructures and traffic. By becoming one of our own biggest customers, we were able to identify and overcome a lot of limitations — improving the response speed of our APIs, optimising our databases, and much more.
2843
2844To minimise latency and meet geographic distribution requirements, we offer our customers a wide range of datacentres around the world. All these datacentres were potential targets for the growth of our platform. For logistical reasons, we chose to launch a single new datacentre in Europe. And this doesn’t have an impact on our websites: the differences in latency between our datacentres are so minimal that they don’t even seem like they are hosted websites (the increase is around just a few milliseconds, and it takes a few hundred milliseconds to generate webpages).
2845
2846To choose our new datacentre, we analysed our natural growth to work out our infrastructure requirements. In fact, our infrastructure grows every week with new hardware deliveries, and we were at risk of filling up our datacentres so quickly that it would prevent our customers from renting dedicated servers and other OVH services. According to these criteria, only two datacentres met our needs in terms of infrastructure in 2016: Gravelines in Northern France, and Beauharnois in Canada. Since our platform is currently only deployed in Europe, we’ve started working on Gravelines.
2847
2848At the same time, we reviewed and optimised the hardware used to build our clusters, so that we could deliver higher performance. The innovations introduced in Gravelines have helped us further improve our platform’s availability.
2849
2850The biggest challenge was not changing the service experience — we’ve kept all the features, and kept all of the same graphical interfaces and APIs. Our goal was simply to renew the infrastructure, not the commercial products themselves.
2851
2852This datacentre for web hosting was opened in July 2016. And since November that same year, all of our new hosting plans have been delivered there.
2853
2854Every year, customers cancel their web hosting services because they no longer use them, or they’re migrating to other products, such as VPS solutions. As a result of this, the number of websites hosted in Paris has decreased gradually over the past three years. This helped us handle the increase in power required for the remaining websites, without increasing the capacity of our infrastructure in Paris.
2855
2856Given the natural decline in the number of websites hosted at the datacentre, we decided it would better to wait for most of our customers to cancel their services before we migrated them. Why do this when there are three million websites left?
2857Why did we choose to migrate our datacentre?
2858To give Paris a new lease of life
2859
2860There are several reasons why we’re starting this monumental undertaking. But the main reason is managing obsolescence.
2861
2862Our infrastructure is based on physical resources housed in this datacentre: dedicated and virtualised servers (which are based on physical machines), network elements, and a cooling circuit (water-cooling and air conditioning). And in order for the infrastructure to remain available 24/7, we need to renew this hardware periodically.
2863
2864For dedicated servers, it’s quite simple. If a server becomes faulty, it can simply be replaced with a new one. The servers built for Paris don’t have the same technical and logistical improvements that our other datacentres benefit from, and are becoming increasingly difficult to assemble, as our datacentre’s obsolescence requires us to renew things more and more.
2865
2866We have considered replacing these servers with next-generation models, but we would need to modify the architecture of entire rooms. And in order to achieve this without any impact on our platform’s availability, we would need space to build new rooms before migrating the servers one by one. In a building that is almost at full capacity, this would require emptying rooms.
2867
2868Dedicated servers also need power, a network, and a cooling system to work. All of these elements are also managed by physical hardware: air conditioning and water-cooling to cool down the machines, routers and switches for the network, electrical transformers, UPS devices, and batteries for the electricity.
2869
2870These redundant physical infrastructures must also be replaced on a regular basis to avoid any downtime. If one access path fails, the second will take over. Moreover, this is a common operation performed by technicians, so that they can carry out minor maintenance tasks on hardware components.
2871
2872The process of fully replacing these infrastructures with new ones is long and complex. Relying on a single access path for such a long time period was just not an option. Replacing them would have required setting up a third path, and then switching over to it when everything was ready. However, this would also mean that there would need to be space in the datacentre for all these operations.
2873
2874After twenty years, our Paris datacentre has reached the end of a lifecycle. Large-scale work is required at all levels, and this requires space. This is the main reason behind the migration.
2875To increase website performance
2876
2877With the new infrastructure in Gravelines, we are able to provide increased performance for our customers’ websites. Moreover, these new technologies have helped us deploy some additional features that we can’t deploy in Paris without renewing our infrastructures: HTTP/2, MySQL 5.6, and more.
2878
2879Our customers can migrate their projects themselves, but web hosting plan migration is a tricky, delicate procedure. Many customers gave up on it.
2880
2881Once the migration is complete, we will also be able to simplify our operational maintenance, using OVH standards exclusively. This will help us avoid carrying out specific operations in the Paris datacentre, reducing maintenance time and the risks of certain recurring manual operations.
2882How are we migrating so many websites?
2883
2884As a web hosting provider, we mainly specialise in two areas — hosting data, and executing code.
2885
2886Data hosting is a complex operation if it is to maintain its integrity over time, but it’s a relatively standardised industry. Data is stored on a file system (this is a standard) or in a database that uses a specific query language (MySQL 5.5 or MySQL 5.6). So we simply need to reproduce an architecture that meets the same standard on the destination infrastructure, and migrate data to it.
2887
2888Code execution is more complex. It’s very difficult to infer a source code’s behaviour in its environment without at least interpreting it. It can return an error on a specific version of a PHP extension, check for a local folder, and much more. For example, a lot of our customers store the absolute path to their files in the code. However, this means that we cannot change the storage location of the files without affecting their service.
2889
2890When you host services for a few customers, you can easily help them by detecting codes that will no longer work after migration. But on a larger scale, this is difficult. Imagine doing this with the codes for more than three million websites!
2891
2892Asking our customers to change their source code was not a viable solution. Even assuming that all of our customers would read an email about this, some of them would not make the changes due to a lack of time, technical knowledge, or simply forgetting to do so. We would be causing problems for our customers with their websites, and for us, this was just not an option.
2893
2894For nearly a year, we developed several migration scenarios. We extrapolated them in all directions, with two objectives:
2895
2896 the websites must remain functional after the migration is complete, without customers needing to make any changes to their code
2897 the impact on service availability during the migration process must be minimal
2898
2899We implemented and tested some of these scenarios between March and June 2018. This initial work helped us choose the best migration plan. To complete this migration without affecting the services, we needed to adapt part of our infrastructure and information system: creating an inter-datacentre network tunnel, changing load balancers, our database platform, adding SQL proxy servers, and more.
2900
2901We thoroughly tested the chosen scenario. To ensure that this procedure would be carried out with minimal impact, we repeated it many times under real conditions on internal clusters, using data sets based on production platforms.
2902
2903Want to know more details about our migration plan? We can’t include everything in this blog post, so we’ve decided to talk about our migration in a series of posts, since the topic is so vast.
2904
2905Stay tuned for our upcoming posts, which will give you a behind-the-scenes view of the largest-scale website migration carried out by Europe’s biggest hosting provider!
2906<|endoftext|>
2907<|startoftext|>
2908
2909Deploying game servers with Agones on OVH Managed Kubernetes
2910By Horacio Gonzalez / 2019-04-12 / Agones, Kubernetes, OVH Managed Kubernetes, OVH Platform
2911
2912One of the key advantages of usisng Kubernetes is the formidable ecosystem around it. From Rancher to Istio, from Rook to Fission, from gVisor to KubeDB, the Kubernetes ecosystem is rich, vibrant and ever-growing. We are getting to the point where for most deployment needs we can say there is a K8s-based open-source project for that.
2913
2914One of the latests additions to this ecosystem is the Agones project, an open-source, multiplayer, dedicated game-server hosting built on Kubernetes, developed by Google in collaboration with Ubisoft. The project was announced in March, and has already made quite a bit of noise…
2915
2916In the OVH Platform Team we are fans of both online gaming and Kubernetes, so we told ourselves that we needed to test Agones. And what better way to test it than deploying it on our OVH Managed Kubernetes service, installing a Xonotic game server cluster and playing some old-school deathmatches with collegues?
2917Agones on OVH Managed Kubernetes
2918
2919And of course, we needed to write about it to share the experience…
2920Why Agones?
2921
2922Agones (derived from the Greek word agōn, contests held during public festivals or more generally “contest” or “competition at games”) aims to replace the usual proprietary solutions to deploy, scale and manage game servers.
2923
2924Agones enriches Kubernetes with a Custom Controller and a Custom Resource Definition. With them, you can standardise Kubernetes tooling and APIs to create, scale and manage game server clusters.
2925Wait, what game servers are you talking about?
2926
2927Well, Agones’s main focus is online multiplayer games such as FPSs and MOBAs, fast-paced games requiring dedicated, low-latency game servers that synchronize the state of the game between players and serve as a source of truth for gaming situations.
2928
2929These kinds of games ask for relatively ephemeral dedicated gaming servers, with every match running on a server instance. The servers need to be stateful (they must keep the game status), with the state usually held in memory for the duration of the match.
2930
2931Latency is a key concern, as the competitive real-time aspects of the games ask for quick responses from the server. That means that the connection from the player device to the game server should be the most direct possible, ideally bypassing any intermediate server such as a load-balancer.
2932And how do you connect the players to the right server?
2933
2934Every game publisher used to have their own proprietary solutions, but most on them follow a similar flow, with a matchmaking service that groups players into a match, deals with a cluster manager to provision a dedicated instance of game server and send to the players its IP address and port, to allow them to directly connect to the server and play the game.
2935Online gaming matchmaking and game server asignation
2936
2937Agones and it’s Custom Controller and Custom Resource Definition replaces the complex cluster management infrastructure with a standardised, Kubernetes-based tooling and APIs. The matchmaker services interact with these APIs to spawn new game server pods and get their IP address and ports to the concerned players.
2938Online gaming matchmaking and game server asignation with
2939The cherry on the cake
2940
2941Using Kubernetes for these tasks also gives some nice additional bonus, like being able to deploy the full gaming infrastructure in a developer environnement (or even in a minikube), or easily clone it to deploy in a new data center or cloud region, but also offering a whole platform to host all the additional services needed to build a game: account management, leaderboards, inventory…
2942
2943And of course, the simplicity of operating Kubernetes-based platforms, especially when they dynamic, heterogeneous and distributed, as most online gaming platforms.
2944Deploying Agones on OVH Managed Kubernetes
2945
2946There are several ways to install Agones in a Kubernetes cluster. For our test we chose the easiest one: installing with Helm.
2947Enabling creation of RBAC resources
2948
2949The first step to install Agones is to setup a service account with enough permissions to create some special RBAC resource types.
2950
2951kubectl create clusterrolebinding cluster-admin-binding \
2952 --clusterrole=cluster-admin --serviceaccount=kube-system:default
2953
2954Now we have the Cluster Role Binding needed for the installation.
2955Installing the Agones chart
2956
2957Now let’s continue by adding Agones repository to Helm’s repository list.
2958
2959helm repo add agones https://agones.dev/chart/stable
2960
2961And then installing the stable Agones chart:
2962
2963helm install --name my-agones --namespace agones-system agones/agones
2964
2965The installation we have just done isn’t suited for production, as the official install instructions recommend running Agones and the game servers in separate, dedicated pools of nodes. But for the needs of our test, the basic setup is enough.
2966Confirming Agones started successfully
2967
2968To verify that Agones is running on our Kubernetes cluster, we can look at the pods in the agones-system namespace:
2969
2970kubectl get --namespace agones-system pods
2971
2972If everything is ok, you should see an agones-controller pod with a Running status:
2973
2974$ kubectl get --namespace agones-system pods
2975NAME READY STATUS RESTARTS AGE
2976agones-controller-5f766fc567-xf4vv 1/1 Running 0 5d15h
2977agones-ping-889c5954d-6kfj4 1/1 Running 0 5d15h
2978agones-ping-889c5954d-mtp4g 1/1 Running 0 5d15h
2979
2980You can also see more details using:
2981
2982kubectl describe --namespace agones-system pods
2983
2984Looking at the agones-controller description, you should see something like:
2985
2986$ kubectl describe --namespace agones-system pods
2987Name: agones-controller-5f766fc567-xf4vv
2988Namespace: agones-system
2989[...]
2990Conditions:
2991 Type Status
2992 Initialized True
2993 Ready True
2994 ContainersReady True
2995 PodScheduled True
2996
2997Where all the Conditions should have status True.
2998Deploying a game server
2999
3000The Agones Hello world is rather boring, a simple UDP echo server, so we decided to skip it and go directly to something more interesting: a Xonotic game server.
3001
3002Xonotic is an open-source multi-player FPS, and a rather good one, with lots of interesting game modes, maps, weapons and customization options.
3003
3004Deploying a Xonotic game server over Agones is rather easy:
3005
3006kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/agones/release-0.9.0/examples/xonotic/gameserver.yaml
3007
3008The game server deployment can take some moments, so we need to wait until its status is Ready before using it. We can fetch the status with:
3009
3010kubectl get gameserver
3011
3012We wait until the fetch gives a Ready status on our game server:
3013
3014$ kubectl get gameserver
3015NAME STATE ADDRESS PORT NODE AGE
3016xonotic Ready 51.83.xxx.yyy 7094 node-zzz 5d
3017
3018When the game server is ready, we also get the address and the port we should use to connect to our deathmatch game (in my example, 51.83.xxx.yyy:7094).
3019It’s frag time
3020
3021So now that we have a server, let’s test it!
3022
3023We downloaded the Xonotic client for our computers (it runs on Windows, Linux and MacOS, so there is no excuse), and lauched it:
3024xonotic
3025
3026Then we go to the Multiplayer menu and enter the address and port of our game server:
3027
3028And we are ready to play!
3029And on the server side?
3030
3031On the server side, we can spy how things are going for our game server, using kubectl logs. Let’s begin by finding the pod running the game:
3032
3033kubectl get pods
3034
3035We see that our game server is running in a pod called xonotic:
3036
3037$ kubectl get pods
3038NAME READY STATUS RESTARTS AGE
3039xonotic 2/2 Running 0 5d15h
3040
3041We can then use kubectl logs on it. In the pod there are two containers, the main xonotic one and a Agones sidecar, so we must specify that we want the logs of the xonotic container:
3042
3043$ kubectl logs xonotic
3044Error from server (BadRequest): a container name must be specified for pod xonotic, choose one of: [xonotic agones-gameserver-sidecar]
3045$ kubectl logs xonotic xonotic
3046>>> Connecting to Agones with the SDK
3047>>> Starting health checking
3048>>> Starting wrapper for Xonotic!
3049>>> Path to Xonotic server script: /home/xonotic/Xonotic/server_linux.sh
3050Game is Xonotic using base gamedir data
3051gamename for server filtering: Xonotic
3052Xonotic Linux 22:03:50 Mar 31 2017 - release
3053Current nice level is below the soft limit - cannot use niceness
3054Skeletal animation uses SSE code path
3055execing quake.rc
3056[...]
3057Authenticated connection to 109.190.xxx.yyy:42475 has been established: client is v6xt9/GlzxBH+xViJCiSf4E/SCn3Kx47aY3EJ+HOmZo=@Xon//Ks, I am /EpGZ8F@~Xon//Ks
3058LostInBrittany is connecting...
3059url_fclose: failure in crypto_uri_postbuf
3060Receiving player stats failed: -1
3061LostInBrittany connected
3062LostInBrittany connected
3063LostInBrittany is now spectating
3064[BOT]Eureka connected
3065[BOT]Hellfire connected
3066[BOT]Lion connected
3067[BOT]Scorcher connected
3068unconnected changed name to [BOT]Eureka
3069unconnected changed name to [BOT]Hellfire
3070unconnected changed name to [BOT]Lion
3071unconnected changed name to [BOT]Scorcher
3072[BOT]Scorcher picked up Strength
3073[BOT]Scorcher drew first blood!
3074[BOT]Hellfire was gunned down by [BOT]Scorcher's Shotgun
3075[BOT]Scorcher slapped [BOT]Lion around a bit with a large Shotgun
3076[BOT]Scorcher was gunned down by [BOT]Eureka's Shotgun, ending their 2 frag spree
3077[BOT]Scorcher slapped [BOT]Lion around a bit with a large Shotgun
3078[BOT]Scorcher was shot to death by [BOT]Eureka's Blaster
3079[BOT]Hellfire slapped [BOT]Eureka around a bit with a large Shotgun, ending their 2 frag spree
3080[BOT]Eureka slapped [BOT]Scorcher around a bit with a large Shotgun
3081[BOT]Eureka was gunned down by [BOT]Hellfire's Shotgun
3082[BOT]Hellfire was shot to death by [BOT]Lion's Blaster, ending their 2 frag spree
3083[BOT]Scorcher was cooked by [BOT]Lion
3084[BOT]Eureka turned into hot slag
3085[...]
3086
3087Add some friends…
3088
3089The next step is mostly enjoyable: asking the collegues to connect to the server and doing a true deathmatch like in Quake 2 times.
3090And now?
3091
3092We have a working game server, but we have barely uncovered the possibilities of Agones: deploying a fleet (a set of warm GameServers that are available to be allocated from), testing the FleetAutoscaler (to automatically scale up and down a Fleet in response to demand), making some dummy allocator service. In future blog posts we will dive deeper into it, and explore those possibilities.
3093
3094And in a wider context, we are going to continue our exploratory journey on Agones. The project is still very young, an early alpha, but it shows some impressive perspectives.
3095<|endoftext|>
3096<|startoftext|>
3097
3098The Role of Humans in Digital Businesses
3099By Octave Klaba / 2019-04-10 / OVH Thinking
3100
3101In 2017, many factors contributed to the significant growth of OVH: hiring in Europe; the acquisition of a company in the USA; the construction of 14 new data centers; and the start of activities in APAC. In fewer than 18 months, we doubled the number of employees from twelve hundred to twenty-five hundred. I don’t recommend scaling this fast!
3102
3103OVH is a digital company in that we use software to deliver the service ordered by the customer. Like other companies, we serve our customers in many ways, but the actions of our employees do not have the same why. Toward the end of 2017, I realized that many who joined us did not understand what the unique qualities of a digital business are. In other words, they didn’t get the why. To train our teams quickly, we had to create simple tools that incorporate principles, tools, and often common sense to educate everyone about the fundamentals of a digital company.
3104BFLNT
3105
3106One of the tools I developed has the acronym “BFLNT”. The name is not very sexy but it gets the job done. I proposed using this tool to examine the entire OVH organization to answer some simple questions that must be asked when recruiting in a digital company. Some of those questions are:
3107
3108 When recruiting new teams, how do we ensure that we are still building a digital company?
3109 How do we know that we are focusing our efforts in the right place?
3110 How do we describe the mission of each employee?
3111 What is involved in the transformation of a business into a digital enterprise?
3112 What is the impact of this transformation on the business model?
3113
3114This is not the only framework for defining the structure of a digital enterprise. If you know of other similar tools, please do not hesitate to share them on Twitter @olesovhcom
3115B: BAU = Business as Usual
3116
3117In BAU, we consolidate all the projects, tasks, and daily actions needed to execute the processes necessary to deliver the service sold to the customer. These business processes remain constant over time.
3118
3119The Digital Revolution consists of digital companies conducting BAU through the use of machines powered by software. BAU processes are identified in the context of an ideal company, with no glitches or bugs. Software, robots, drones, and artificial intelligence are used to carry out the processes, based on data. In this type of futuristic and ideal digital enterprise, no one is employed to deliver the specific services ordered by the customer. BAU processes have been so carefully thought out and automated that there is no need to have employees do this work.
3120
3121Let’s take a real-life BAU example: initiating a bank transfer. Before the Digital Revolution, the customer spoke to a teller and signed documents to initiate the transfer. Then the bank executed a series of processes, using off-shoring for many of them.
3122BAU in banking...
3123
3124
3125
3126Today, with one click of his or her smartphone, the customer can initiate a bank transfer using the app or software created by the digital bank. This is classic BAU, with no human intervention. Similar types of transactions are possible in doing business with hotels, insurance companies, movie theaters, and purchasing music.
3127
3128By reducing the costs of BAU, improving productivity, and reducing costs for customers, software is propelling digital companies and drives the Digital Revolution. Customers also improve productivity by initiating business themselves through mobile apps and websites. Behind the scenes, companies use robots, drones, and will soon use autonomous container ship and cars to execute client orders.
3129
3130 A comparison of standard vs. digital company business models shows that in a standard business model, the cost of BAU is proportional to income. In a digital enterprise, BAU is a fixed cost. Digital companies continue to invest in and to use software, robots, drones, and anything that can be automated in order to further reduce costs. Thus, a digital company recruits employees not to carry out BAU processes, but to further automate them.
3131
3132When I started OVH, I immediately wanted to use software everywhere. Why? First, I was born with a computer in my hands and wanted to make a box where I could code crazy stuff. In addition, I started OVH with nothing. In software, I saw a solution to do much with little and to lower costs for clients. From the beginning, we coded all the business processes. Today a customer can still go to the site, order a service, pay, and the software will configure the infrastructure and deliver the service. There is no human intervention. Historically, our prices have been cheaper because of this software attitude in our DNA.
3133
3134However, the technology doesn’t exist that can automate everything, so outside of our core business, the rest of BAU is not (yet) fully digitalized , though software, robots, and automation do support each of the following areas of our work. Support services answer our customers’ questions about the use of products and services. The sales reps listen to the customer to understand client needs in order to select the right product. Factory teams assemble servers and ship them to data centers. We haven’t digitalized the building and expansion of our data centers. The finance, human resources, and legal teams are needed to support the activity of OVH. So even in digital companies like OVH, BAU with no human intervention is not (yet) true. We know that market pressure to offer even cheaper, more scalable products will keep us always questioning ourselves. OVH is committed to further industrialize, automate, develop, and innovate in order to reduce the cost of BAU and thus reduce prices for our customers.
3135
3136The mindset of the BAU teams is quite schizophrenic. On one hand, teams must be involved in the execution of and performance of daily processes. On the other hand, these teams must deliver quality work that identifies improvements in the tools that will simplify their work, reduce their workload, work smarter, or even eliminate their jobs. Crucial to this process is a work environment in which perseverance in the present is balanced with an eye to the future. We are constantly questioning and improving.
3137
3138BAU is based on the ideal company, but we know that a perfect world doesn’t exist. For this reason, I’ve created the other 4 features of BFLNT:
3139
3140 F: FIX
3141 L: LEAN
3142 N: NEW
3143 T: TRANSFORMATION
3144
3145F: FIX
3146
3147 Because it’s impossible to deliver BAU glitch-free, FIX projects, processes, and teams are in place to solve unforeseen problems.
3148
3149In OVH, our RUN teams work on 2 missions 24/7 :
3150
3151 Monitor customer-ordered, BAU-delivered services and software FIXes, which automatically correct bugs in BAU software. RUN teams also intervene when FIX software doesn’t fix bugs.
3152 Respond to customers reporting incidents missed by monitoring. Even the best monitoring is not able to imagine all potential problems.
3153
3154 We continually improve our monitoring, software FIXes, and BAU software. The post mortem process is in place to eliminate recurring problems. Failing to perform this process correctly may lead to the additional hiring for FIX and so increase the cost. Or reduce the quality of the service.
3155
3156Because bugs will always exist, it’s a mistake to look for savings on FIX and try to reduce its cost to zero, which is impossible. In the ever-changing digital world, clients may accept the occasional glitch, but non-functioning software is unacceptable.
3157
3158
3159
3160American companies probably understand the importance of FIX more than European companies do. Substantial investment in FIX in these companies is due to the FIX in their DNA. Because of the huge cost of FIX, American digital companies seek volume and expansion into the world market as one way of lessening the impact of FIX spending on the business model.
3161
3162With the mottos never let go and never give up, the FIX teams tenaciously persevere. They spend every day focusing on identifying and fixing problems. Not everyone can handle this kind of work. The character traits we seek for those on our FIX teams are positivity, joy, perseverance, courage, and a strong service orientation.
3163L : LEAN
3164
3165LEAN in the digital context involves improving the effectiveness of the BAU software’s operation, in the same way that streamlining work on assembly lines is done in manufacturing. These are LEAN’s three priorities. First, LEAN can take the form of refactoring code, optimizing databases, and changing programming languages, thus executing the software faster with fewer computing resources. Second, LEAN also improves the code to fix the bugs discovered by FIX. Finally, LEAN teams also develop the ideas that come from BAU.
3166
3167LEAN team members are meticulous perfectionists who work on those rare instances of problems that are certain to happen. These teams also consider general improvements to services, as well as specific challenges, such as how best to deliver services to customers in 10 seconds vs. 20.
3168
3169The rationale for making such small incremental improvements in speed is not a defensive because our competitors will do it. LEAN teams passionately and constantly push against their own limits in what Simon Sinek describes as The Infinite Game. It is a sport that has no limit.
3170N: NEW
3171
3172Risk-taking is in the DNA of each company. NEW for digital companies typically means developing new code to offer customers new features, new products, new services, all in the service of meeting new customer needs.
3173
3174But NEW is not only about writing code. NEW must take into account customer needs and the business model, which includes price, cost, volume, and the size of the market. Once we’ve considered those factors, then we’ll focus on the software. NEW teams require marketing, technical, and financial know-how in developing new products.
3175
3176NEW teams collaborate with customers in rapid cycles of iterative processes. They test ideas with customers, let go of the ineffective ones and run with the ones that work. Developing customer relationships based on trust and collaboration is crucial to this process.
3177
3178Compromise and disruption are also part of the NEW Team portfolio. The client always wants 1+1 to equal 3 and we need to deliver, which requires thinking outside the box. An example of OVH success in this area involves the cooling of the data centers that we developed in 2003. From the outset, we were convinced that we’d find a way to cool the data centers with water, similar to the way liquid cooling was developed to cool car engines. We simply told ourselves it was possible and then made it happen. This example illustrates that digital companies don’t often create truly new inventions. What we’re really doing is adapting existing technology in new contexts. This is the true meaning of the word hack.
3179
3180The arrival of AI is disrupting the software industry in many ways. NEW and LEAN code software AI which “automates” the developers. Inception. This is a great topic for a future blog post.
3181T: TRANSFORMATION
3182
3183Business TRANSFORMATION is a well-known process. Sometimes TRANSFORMATION is necessary because the company isn’t working effectively and so has to be restructured. Sometimes TRANSFORMATION is needed because the company is highly successful and growing, so it needs to transform processes and the organization in order to work effectively on a larger scale.
3184
3185In digital companies, TRANSFORMATION to flat, transversal models, in which organizations use collective intelligence, are more common. These companies don’t have the luxury to pay for organizational silos, ego battles, or slow execution due to a lack of information. Digital companies have survival in their DNA. Picture a living entity designed by Mother Nature in which the organs are autonomous and also interdependent. Thought is given to including men and women in the company from diverse backgrounds, countries, and ways of thinking, and allowing them to work together effectively. The result of millions of years of Darwinian evolution makes these companies more organic, more humane, and more robust, which helps them to survive.
3186
3187TRANSFORMATION teams share the never give up ethos, but work in a quiet and empathetic manner. They specialize in the process of transforming the company. Transforming the culture is the first step and is achieved by listening, explaining, and providing meaning, thereby helping employees to go through the process of TRANSFORMATION. They encourage and reassure employees while maintaining focus.
3188SUMMING UP
3189
3190Digital businesses already incorporate LEAN, which provides the opportunity to transform BAU. Because these go hand-in-hand, the culture will also need to transform. Once BAU is automated, the center of gravity of the company must switch to data, which will, in turn, change the organization, its functioning, and the roles of the individuals in the company.
3191
3192Building BAU, FIX, LEAN, NEW and TRANSFORMATION teams requires understanding psychology, the deep motivations of all involved, and finding the best talent who can work together. At OVH, about 50% of the teams work on BAU, 10% on FIX, 25% on LEAN, 10% on NEW, and 5% on TRANSFORMATION. Our goal is to move to about 40% on BAU, 10% on FIX, 30% on LEAN, 15% on NEW, and 5% on TRANSFORMATION.
3193
3194LEAN and the NEW teams are expensive. Companies have to recruit employees who develop software and need to transform the business, its organization, and culture. These are investments in current production tools for a better future. If one thinks long term, these investments are indispensable. And there’s a lot of good news. First, setting up LEAN and NEW teams allows a company to requalify all these direct investment costs. At the level of budget, all these OPEX can be invested as CAPEX and amortized over 5 years. This shows that the business has remained profitable despite a profound internal transformation. However, this does not change the cash flow situation; employees still must be paid. Increasing capital may be necessary in order to have the funds for investing in LEAN and NEW and thus reduce the costs of BAU. There are many players in the market to assist with this. Profitability will be increased and this is good in the long term.
3195Capex vs Opex
3196
3197
3198
3199Using the ecosystem model of interdependence will further reduce the costs of BAU more quickly. The competitive edge that American companies have over European companies is often thanks to speed and their ability to take risks. In risk-averse Europe, we are cautious, prefer to go slower and stay in control. Compromise is not actually used in Europe and one prefers to think in a binary way, in good vs. bad. In Europe, we have not yet seen that solutions come through trust and working within the ecosystem.
3200
3201I remain convinced that neither a company nor an ecosystem can succeed without trust, which cannot exist without passion. A company and an ecosystem are by definition organizations built on foundations of human concerns. Bonds of trust are created in part through an understanding of one’s own limits and the willingness to be vulnerable. However, it’s not easy to create organizations in which vulnerability is a sign of maturity and not an admission of weakness. This is not the typical mindset employees bring to work. In fact, the opposite is often true. It’s imperative to know how to create an environment that fosters trust and encourages women and men to want to work together. Trust is the basis of a company, a reliable, profitable long-term ecosystem.
3202
3203We are definitely living in passionate times!
3204<|endoftext|>
3205<|startoftext|>
3206
3207Loops: Powering Continuous Queries with Observability FaaS
3208By Rémi Collignon-Ducret / 2019-04-04 / Loops, Observability, OVH Observability, V8
3209
3210We’re all familiar with that small snippet of code that adds reasonable value to your business unit. It can materialise as a script, a program, a line of code… and it will produce a report, new metrics, KPIs, or create new composite data. This code is intended to run periodically, to meet requirements for up-to-date information.
3211
3212In the Observability team, we encounter these snippets as queries within the Time Series Database (TSDB), to express continuous queries that are responsible for automating different use cases like: deletes, rollups or any business logic that needs to manipulate Time Series data.
3213
3214We already introduced TSL in a previous blog post, which demonstrated how our customers use the available OVH Metrics protocols, like Graphite, OpenTSDB, PromQL and WarpScript™, but when it comes to manipulating, or even creating new data, you don’t have a lot of options, although you can use WarpScript™ or TSL as scripting language instead of a query one.
3215
3216In most cases, this business logic requires building an application, which is more time-consuming than expressing the logic as a query targeting a TSDB. Building the base application code is the first step, followed by the CI/CD (or any delivery process), and setting up its monitoring. However, managing hundreds of little apps like these will add an organic cost, due to the need to maintain them along with the underlying infrastructure.
3217
3218We wanted to ensure these valuable tasks did not stack up on the heads of few developers, who would then need to carry the responsibilities of data ownership and computing resources, so we wondered how we could automate things, without relying on the team to setup the compute jobs each time someone needed something.
3219
3220We wanted a solution that would focus on the business logic, without needing to run an entire app. This way, someone wanting to generate a JSON file with a daily data report (for example) would only need to express the corresponding query.
3221Running business logic over Loops
3222You shall not FaaS!
3223
3224Scheduling jobs is an old, familiar routine. Be it bash cron jobs, runners, or specialised schedulers, when it comes to wrapping a snippet of code and making it run periodically, there is a name for it: FaaS.
3225
3226FaaS was born with a simple goal in mind: reduce development time. We could have found an open source implementation to evaluate (e.g. OpenFaas), but most of these relied upon a managed container stack. Having one container per query would be very costly, plus warming up a container to execute the function and then freezing it would have been very counterproductive.
3227
3228This would have required more scheduling and automation than we wanted for our end-goal, would have lead to suboptimal performance, and would have introduced a new requirement for cluster capacity management. There is also a build time required to deploy a new function in a container, which is consequently not free.
3229#def <Loops>
3230
3231That was when we decided to build “Loops”: an application platform where you can push the code you want to run. That’s all. The goal is to push a function (literally!) rather than a module, like all current FaaS solutions do:
3232
3233function dailyReport(event) {
3234 return Promise.resolve('Today, everything is fine !')
3235}
3236
3237You can then execute it manually, with either an HTTP call or a Cron-like scheduler.
3238These both aspects are necessary, since you might (for example) have a monthly report, but one day will require an additional one, 15 days after the last report. Loops will make it easy to manually generate your new report, in addition to the monthly one.
3239
3240There were some necessary constraints when we began building Loops:
3241
3242 This platform must be able to easily scale, to support OVH’s production load
3243 It must be highly available
3244 It must be language-agnostic, because some of us prefer Python, and others JavaScript
3245 It must be reliable
3246 The scheduling part mustn’t be correlated with the execution one (μService culture)
3247 It must be secure and isolated, so anybody can push obscure code on the platform
3248
3249Loops implementation
3250
3251We choose to build our first version on V8. We chose JavaScript as the first language, because it’s easy to learn, and asynchronous data flows are easily managed using Promises. Also, it fits very well with a FaaS, since Javascript functions are highly expressive. We built it around the new NodeJS VM module, which allows you to execute code in a dedicated V8 context.
3252
3253A V8 context is like an object (JSON), isolated from your execution. In context, you can find native functions and objects. However, if you craft a new V8 context, you will see that some variables or functions are not natively available (setTimeout(), setInterval() or Buffer, for example). If you want to use these, you will have to inject them into your new context. The last important thing to remember is that when you have your new context, you can easily execute a JavaScript script under string form on it.
3254
3255Contexts fulfil the most important part of our original list of requirements: isolation. Each V8 context is isolated, so it cannot talk to another context. This means a global variable defined in one context is not available in a different one. You will have to build a bridge between them if you want this to be the case.
3256
3257We didn’t want to execute scripts with eval(), since a call to this function allows you to execute JS code on the main shared context, with the code calling it. You can then access to the same objects, constants, variables, etc. This security issue was a deal breaker for the new platform.
3258
3259Now we know how to execute our scripts, let’s implement some management for them. To be stateless, each Loops worker instance (i.e. a JavaScript engine able to run code in a VM context) must have the last version of each Loop (a loop is a script to execute). This means that when a user pushes a new Loop, we have to sync it on each Loops worker. This model fits well with the pub/sub paradigm, and since we already use Kafka as a pub/sub infrastructure, it was just a matter of creating a dedicated topic and consuming it from the workers. In this case, publication involves an API where a user submits their Loops, which produce a Kafka event containing the function body. As each worker has its own Kafka consumer group, they all receive the same messages.
3260
3261Workers subscribe to Loops updates as Kafka consumers and maintain a Loop store, which is an embedded key (the Loop hash)/Value (the function’s current revision). In the API part, Loop hashes are used as URL parameters to identify which Loop to execute. Once called, a Loop is retrieved from the map, then injected in a V8 context, executed, and dropped. This hot code reload mechanism ensures that each Loop can be executed on every worker. We can also leverage our load balancers’ capabilities to distribute the load on the workers. This simple distribution model avoids complex scheduling and eases the maintainability of the overall infrastructure.
3262
3263In order to be reboot-proof, we make use of Kafka’s very handy log compaction feature. Log compaction allows Kafka to keep the last version of each keyed message. When a user creates a new Loop, it will be given a unique ID, which is used as a Kafka message key. When a user updates a Loop, this new message will be forwarded to all consumers, but since the key already exists, only the last revision will be kept by Kafka. When a worker restarts, it will consume all messages to rebuild its internal KV, so the previous state will be restored. Kafka is used here as a persistent store.
3264Creating, editing and deleting loops
3265Loops runtimes
3266
3267Even if the underlying engine is able to run native Javascript, as stated above, we wanted it to run more idiomatic Time Series queries like TSL or WarpScript™. To achieve this, we created a Loops Runtime abstraction that wraps not only Javascript, but also TSL and WarpScript™ queries into Javascript code. Users have to declare a Loop with it’s runtime, after which it’s just a matter of wrappers working. For example, executing a WarpScript™ Loop involves taking the plain WarpScript™ and sending it through a node-request HTTP call.
3268Running a LoopRunning a Loop
3269Loops feedback
3270
3271Executing code safely is a start, but when it comes to executing arbitrary code, it’s also useful to get some feedback on the execution state. Was it successful or not? Is there an error in the function? If a Loop is in a failure state, the user should be notified straight away.
3272
3273This leads us to one special condition: a user’s scripts must be able to tell if everything is OK or not. There are two ways to do that in the underlying JavaScript engine: callbacks and Promises.
3274We choose to go with Promises which offers a better asynchronous management. Every Loop returns a Promise at the end of the script. A rejected promise will produce an HTTP 500 error status, while a resolved one will produce an HTTP 200 status.
3275Loops scheduling
3276
3277When publishing Loops, you can declare several triggers, in a similar way to Cron. Each trigger will perform an HTTP call to your Loop, with optional parameters.
3278
3279Based on this semantic, to generate multiple reports, we can register a single function that would be scheduled with different contexts, defined by various parameters (region, rate, etc.). See the example below:
3280
3281functions:
3282 warp_apps_by_cells:
3283 handler: apps-by-cells.mc2
3284 runtime: ws
3285 timeout: 30
3286 environment:
3287 events:
3288 - agra:
3289 rate: R/2018-01-01T00:00:00Z/PT5M/ET1M
3290 params:
3291 cell: ovh-a-gra
3292 - abhs:
3293 rate: R/2018-01-01T00:00:00Z/PT5M/ET1M
3294 params:
3295 cell: ovh-a-bhs
3296
3297The scheduling is based on Metronome, which is an open-source event scheduler, with a specific focus on scheduling rather than execution. It’s a perfect fit for Loops, since Loops handle the execution, while relying on Metronome to drive execution calls.
3298Loops pipelines
3299
3300A Loops project can have several Loops. One of our customers’ common use cases was having was to use Loops as a data platform, in a data flow fashion. Data flow is a way to describe a pipeline of execution steps. In a Loops context, there is a global `Loop` object, which allows the script to execute another Loop with this name. You can then chain Loop executions that will act as step functions.
3301Pain points: scaling a NodeJS application
3302
3303Loops workers are NodeJS applications. Most of NodeJS developers know that NodeJS uses an mono-threaded event loop. If you don’t take care of the threading model of your nodeJS application, you would likely suffer for a lack of performance, since only one host thread will be used.
3304
3305NodeJS also has a cluster module available, which allows an app to use multiple threads. That’s why in a Loops worker, we start with an N-1 thread for handling API calls, where N is the total number of threads available, which leaves one dedicated to the master thread.
3306
3307The master thread is in charge of consuming Kafka topics and maintaining the Loops store, while the worker thread starts an API server. For every requested Loop execution, it asks the master for the script content, and executes it in a dedicated thread.
3308
3309With this setup, one NodeJS application with one Kafka consumer is started per server, which make it very easy to scale out the infrastructure, just by adding additional servers or cloud Instances.
3310Conclusion
3311
3312In this post, we previewed Loops, a scalable, metrics-oriented FaaS with native JavaScript support, and extended WarpScript™ and TSL support.
3313
3314We still have a few things to enhance, like ES5-style dependency imports and metrics previews for our customers’ Loops projects. We also plan to add more runtimes, especially WASM, which would allow many other languages that can target it, like Go, Rust or Python, to suit most developer preferences.
3315
3316The Loops platform was part of a requirement to build higher-level features around OVH Observability products. It’s a first step towards offering more automated services, like metrics rollups, aggregation pipelines, or logs-to-metrics extractors.
3317
3318This tool was built part of the Observability products suite with a higher abstraction level in mind, but you might also want direct access to the API, in order to implement your own automated logic for your metrics. Would you be interested in such a feature? Visit our Gitter channel to discuss it with us!
3319<|endoftext|>
3320<|startoftext|>
3321
3322Dedicated Servers: twice the bandwidth for the same price
3323By Yaniv Fdida / 2019-03-27 / Automation, Datacenters & network, Dedicated Servers, Evolution
3324
3325We announced it at the OVH Summit 2018… We were going to double the public bandwidth on OVH dedicated servers, without changing the price.
3326
3327A promise is a promise, so several weeks ago we fulfilled it: your servers now have twice the bandwidth, for the same price!
3328
33292019-03-27 - Dedicated servers : twice the bandwidth for the same price
3330
3331We knew from the start that this upgrade would be feasible, as our 20Tbps network core can definitely cope with the extra load! We work daily to make sure you enjoy using our network, which is one of the largest in the world among hosting providers.
3332
3333Indeed, our network is constantly evolving, and our teams work tirelessly to optimise the capacity planning and anticipate the load generated by all our customers, spread across our 28 datacentres.
3334
3335It’s also more than capable of managing the waves of DDoS attacks that arrive almost daily, sending millions of requests to hosted servers in an attempt to render them unavailable. These are absorbed by our in-house Anti-DDoS Protection, without any customer impact! As a reminder, we suffered one of the biggest attacks on record a few years ago, which generated traffic of more than 1Tbps, but was nonetheless absorbed by our infrastructure, without any impact on our customers.
3336
3337To guarantee this additional public bandwidth, our Network and Bare Metal teams have worked closely together to be more and more LEAN when it comes to our infrastructures. As a result, thousands of active devices (routers, switches, servers etc.) have been updated in a completely transparent way!
3338
3339The overall deployment process has taken some time, as we have done a rolling upgrade, taking a QoS and isolation approach to prevent possible traffic spikes. Product range by product range, datacentre by datacentre… The deployment itself was quick and painless, as it was fully automated. The potential bottleneck was making sure that everything worked as intended, which involved carefully monitoring our full server farm, as bandwidth doubling can have a huge impact, especially at OVH, where (let me mention it once again!) egress traffic is indeed unlimited!
3340
3341Here’s a quick overview of the new bandwidth for each server range:
3342
3343Even if the bandwidth doubling doesn’t yet cover the full extent of our ranges, or the So you Start and Kimsufi servers, we haven’t forgotten our customers who’re using those servers. We have also updated our bandwidth options to offer all our customers an even better service, at an even better price.
3344
3345We aren’t going to stop there though! We will soon announce some nice new features on the network side of things. And of course, lots of other innovations will arrive in the coming months. But those are other stories, which will be told in other blog posts…
3346<|endoftext|>
3347<|startoftext|>
3348
3349Dedicated Servers: The new ranges are on their way!
3350By Yaniv Fdida / 2019-03-25 / Dedicated Servers, Evolution
3351
3352Since the year 2000, OVH teams have been designing and assembling all of the OVH group’s servers in-house. In fact, there are currently 400K servers in operation, and 2018 marked the millionth server assembled! Our dedicated servers are the foundation of all OVHcloud products. That’s why we wanted to enhance the different ranges, in order to improve our customers’ overall experience.
3353Our new ranges of dedicated servers are about to launch!
3354
3355OVH’s new dedicated server ranges aim to offer a wide selection of servers that are both powerful and easy to use.
3356x-men wow GIF by 20th Century Fox Home Entertainment
3357
3358For the past couple of months, all engineering groups involved with the Bare Metal Product Unit have been working on the new dedicated servers’ foundations and future offers. From our portfolio’s value proposal, to our functional requirements (i.e. the technical building blocks); from our methodology, compliance, and production processes, to our datacentre standards and industry foundations, we have addressed the entire value chain!
3359
3360It has been an important and very interesting journey, where we’ve capitalised on our customers’ feedback, integrated our scalability constraints, and developed our operational excellence (and our ambitions!). And because innovation is our engine, we wanted to learn from our previous challenges, while still taking an innovative approach, as always!
3361
3362The new ranges each possess strong identities, shared by all their models, which makes it simple for customers to find, identify, and choose the server that’s right for their needs.
3363“But what’s a bare-metal server?”
3364
3365A bare-metal’ server is simply a single-tenant physical server. This is what distinguishes it from other forms of computing, such as virtualisation, and allows for customer access to hardware-specific features, which could not otherwise be provided with constant, guaranteed resources, or managed by a virtualisation layer (hardware flexibility, data-intensive projects, processing power etc.). Each server may run any amount of work for the customer, or may have multiple simultaneous users, but its resources are dedicated entirely to whoever is renting it, without any lock-in !
3366
3367The bare-metal server production chain at a glance!
3368Birth of an OVH Bare Metal server
3369A global approach
3370
3371What does it mean to deploy a new range? Well, for a company like OVH – where bare metal represents a huge footprint of 400K servers in 28 datacentres – it requires a lot of organisation…
3372
3373 Customer feedback and continuous improvement
3374 Performance and innovation
3375 Resilience and scalability
3376 Product line and use case approaches
3377 Quality and lean industrial practices
3378 The best competitive ratio for our customers!
3379
3380Those six pillars were our key drivers when starting this phenomenal programme for the launch of our new dedicated server ranges!
3381Let’s kick-start our new OVHcloud journey!
3382
3383The first new range from our dedicated server family is already here! Its name is Advance.
3384
3385As you might have already seen, the Hosting range of servers has evolved considerably over the years, becoming larger and more versatile, in order to offer greater flexibility. The new Advance range builds on this, and has been designed to meet the needs and requirements of SMEs, providing them with a strong foundation from which to meet their business challenges.
3386
3387Many, many new technical features (motherboards, CPUs, memory, disks, networks, options etc.) and benefits are embedded in the new range, such as:
3388
3389 A scalable infrastructure
3390 A higher, unmetered network bandwidth
3391 A free private network, included with all servers: vRack
3392 10Gbps network interfaces
3393 High-performance NVMe drives (Enterprise class)
3394 New options for drive customisation
3395 Free storage space, remote from the server, for backups
3396 256 additional free IPs
3397 The best anti-DDoS protection
3398
3399But it’s not just about technical improvements…
3400
3401On the pricing side, we kept our promise to offer both no commitment and no setup fees, to make it easier to benchmark our new models and quickly establish which ones should accompany your growth. We also listened to our customers regarding the commitment terms. That’s why we have introduced new discounts for longer-term commitment, with or without an upfront payment!
3402will ferrell yes GIF
3403
3404All this is by design. Our objective is to ease accessibility, and bring you the best of our industry when it comes to the price/performance ratio.
3405
3406Furthermore, we worked hard to enhance our level of quality and increase our overall maturity by moving to a key standard: ISO 27001. Because compliance and quality is just as important as performance! But we’ll talk about that in another post ?…
3407
3408Now stay tuned, because it’s just the beginning! OVHcloud is accelerating for YOU.
3409<|endoftext|>
3410<|startoftext|>
3411
3412Dedicated Servers are ISO 27001 certified
3413By Julien Levrard / 2019-03-22 / Certification, Dedicated Servers, ISO 27001
3414
3415On March 14, OVH obtained ISO/IEC 27001:2013 certification for the Information Security Management System of Dedicated Servers.
3416OVH Dedicated Servers are ISO 27001 certified
3417
3418This certification obtained after an independent audit by LNE company, provides strong reassurance to customers and users of the services hosted on these servers.
3419What is the ISO 27001 standard and certification?
3420
3421ISO/IEC 27001 is an international standard that describes the “requirements for establishing, implementing, maintaining and continuously improving an information security management system” (ISMS). It describes the organisational method which ensures the confidentiality, integrity, availability and traceability of an information system.
3422Daily security
3423
3424Since the beginning of OVH, security has been one of the main objectives of the teams that design, operate and develop the services. The ISMS aims to ensure the systematic, comparable and demonstrable functioning of the means implemented to ensure security.
3425
3426The ISMS is an approach to establishing, maintaining, monitoring, and ensuring continuous improvement of the tools and processes to:
3427
3428 Identify and consolidate OVH’s obligations and commitments in terms of information security.
3429 Set appropriate, understandable and consistent information security objectives.
3430 Implement a risk-based approach to define and prioritise security enhancements.
3431 Establish, industrialise and use security measures.
3432 Communicate and coordinate with all internal and external stakeholders involved.
3433
3434On a day-to-day basis, ISMS consists of managing all risky activities for the service, such as access rights, system and equipment configurations, software updates, infrastructure upgrades, data deletion, partitioning between environments, monitoring and incidents. Ensuring absolute security is not a realistic goal, but the ISMS helps identify vulnerabilities, errors, and malfunctions faster and more reliably. The ISMS ensures the rapid implementation of corrective actions and these actions are followed over time.
3435A team effort
3436
3437A team of security experts work with the teams in charge of the design and operation of the service, customer support, sales teams and OVH management to prioritise these improvements. Coordinating these different perspectives within a product-lifecycle, risk-based approach ensures rapid, pragmatic and industrial adaptation of systems and processes to a quickly evolving threat environment.
3438The certificat audit
3439
3440The certification audits are carried out by accredited companies, in this case by LNE, accredited by COFRAC. The audit itself follows a strict format and is based on formal requirements. The audit is a challenge for the teams but also for the auditor. On the basis of office and datacentre visits, team interviews, in-depth documentation reviews and systems observation over a period of a few weeks, the auditor must formulate his/her opinion on the relevance of the implemented activities, their effectiveness and of course their compliance with all the requirements of the ISO 27001 standard. The auditor also identifies opportunities for improvement to be considered at the end of the audit.
3441What is the scope?
3442
3443The scope of the Information Security Management System covers the provision, connectivity, operational support and decommissioning of Dedicated Servers allocated to customers, the resources provided to customers for the configuration, use and monitoring of allocated infrastructure and service management by OVH teams. The ISMS is therefore firmly designed to focus on the service supplied to the customer.
3444Security as code
3445
3446The ISMS covers all the physical servers managed by OVH, i.e. several hundred thousand servers across the group’s datacentres. Apprehending and managing security in an efficient and sustainable way over such a wide scope means aligning each decision with the principles of standardisation and automation of OVH’s industrial model. In the OVH value chain, all repetitive tasks undertaken by staff are to be phased out over the long term . As a result, the ISMS and security is improved via the automation of daily activities and the development of tools to manage the service in a secure way. Human intervention should be limited to cases requiring in-depth analysis or complex coordination. This model allows an exponential scale up of the management system while limiting the resources needed for its operation.
3447A modular Information Security Management System (ISMS)
3448
3449To some extent all OVH products use the information systems in support of the service and they are themselves hosted on Dedicated Servers, just like all other OVH products. Defining the graph of dependencies and internal responsibilities is somewhat of a mise en abyme. It was, however, a prerequisite for defining a clear and understandable security organisation that enabled ISMS to function effectively. A modular approach has been put in place to segment and structure the responsibilities of each team involved. These relationships are driven by a set of internal service agreements defined and monitored in the ISMS.
3450
3451Datacentres, for example, have a separate ISMS to ensure the physical security of hosting sites and the security of datacentre operations. This ISMS is independently certified and provides a solid foundation for service compliance.
3452
3453The Dedicated Server ISMS certification is based on datacentre certification and covers the servers hosted in these certified datacentres. To date, the datacentres concerned are those in Roubaix (RBX 2,3,5,6,7), all datacentres in Strasbourg, Beauharnois, Singapore and Sydney. The Paris datacentre (P19) hosting part of the information system in support of the service is also concerned, although it does not host any Dedicated Servers allocated to clients. Although all of the company’s servers are covered by the ISMS, certification only concerns these datacentres.
3454What next?
3455
3456ISO 27001 is a general standard that addresses the concerns of most of our customers and sets a framework and organisation for ensuring service security. However, it does not consider compliance with requirements linked to a specific business sector. The ISO 27001 standard provides for the possibility of adding additional requirements and the Dedicated Servers ISMS is also designed for this purpose. The ISMS will therefore gradually integrate new specific measures to cover, for example, the needs for hosting personal health data, the requirements of the banking sector or the regulatory specificities of the public sector in the different countries where OVH provides services.
3457
3458In parallel, the teams are working on the extension of the certification perimeter to include all of the group’s datacentres and in particular those in Erith (UK), Limburg (DE), Ozarow (PL) and Gravelines (FR). The goal is to provide all OVH customers with a uniform level of security assurance regardless of the datacentre region chosen.
3459
3460Finally, the teams will continue to work with the other product teams to complete the catalogue of certified products and gradually extend it to all OVH’s Infrastructure-as-a-Service offerings.
3461<|endoftext|>
3462<|startoftext|>
3463
3464Understanding the anatomy of GPUs using Pokémon
3465By Jean-Louis Queguiner / 2019-03-13 / Artificial Intelligence, Deep learning, GPU
3466In the previous episode…
3467
3468In our previous blog post about Deep Learning, we explained that this technology is all about massive parallel matrix computation, and that these computations are simplistic operations: + and x.
3469Fact 1: GPUs are good for (drum roll)…
3470
3471Once you get that Deep Learning is just massive parallel matrix multiplications and additions, the magic happens. General Purpose Graphic Processing Units (GPGPU) (i.e. GPUs, or variants of GPUs, designed for something other than graphic processing) are perfect for…
3472
3473matrix multiplications and additions !
3474
3475Perfect isn’t it ? But why ? Let me tell you a little story
3476Fact 2: There was a time when GPUs were just GPUs
3477
3478Yes, you read that correctly…
3479
3480The first GPUs in the 90s were designed in a very linear way. The engineer took the engineering process used for graphical rendering and implemented it into the hardware.
3481
3482To keep it simple, this is what a graphical rendering process looks like:
3483Graphical rendering process
3484
3485Uses for GPUs included transformation, building lighting effects, building triangle setups and clipping, and integrating rendering engines at a scale that was not achievable at the time (tens of millions of polygons per second).
3486
3487The first GPUs integrated the various steps of image processing and rendering in a linear way. Each part of the process had predefined hardware components associated with vertex shaders, tessellation modules, geometry shaders, etc.
3488
3489In short, graphics cards were initially designed to perform graphical processing. What a surprise!
3490Fact 3: CPUs are sports cars, GPUs are massive trucks
3491
3492As explained earlier, for image processing and rendering, you don’t want your image being generated pixel per pixel – you want it in a single shot. That means that every pixel of the image – representing every object pointed in the camera, at a given time, in a given position – needs to be calculated at once.
3493
3494It’s a complete contrast with CPU logic, where operations are meant to be achieved in a sequential way. As a result, GPGPUs needed a massively parallel general-purpose architecture to be able to process all the points (vertex), build all the meshes (tessellation), build the lighting, perform the object transformation from the absolute referential, apply texture, and perform shading (I’m still probably missing some parts!). However, the purpose of this blog post is not to look in-depth at image processing and rendering, as we will do that in another blog post in the future.
3495
3496As explained in our previous post, CPUs are like sports cars, able to calculate a chunk of data really fast with minimal latency, while GPUs are trucks, moving lots of data at once, but suffering from latency as a result.
3497
3498Here is a nice video from Mythbusters, where the two concepts of CPU and GPU are explained:
3499
3500Fact 4: 2006 – NVIDIA killed the image processing Taylorism
3501Résultat de recherche d'images pour "temps modern gif"
3502
3503The previous method for performing image processing was done using specialised manpower (hardware) at every stage of the production line in the image factory.
3504
3505This all changed in 2006, when NVIDIA decided to introduce General Purpose Graphical Processing Units using Arithmetic Logical Units (ALUs), aka CUDA cores, which were able to run multi-purpose computations (a bit like a Jean-Claude Van Damme of GPU computation units!).
3506GoDaddy Commercial (2013) featuring Jean-Claude Van Damme Source : https://imgur.com/r/gifs/PvuZxBZ
3507
3508Even today, modern GPU architectures (such as Fermi, Kepler or Volta) are composed of non-general cores, named Special Function Units (SFUs), to run high-performance mathematical graphical operations, such as sin, cosine, reciprocal, and square root, as well as Texture Mapping Units (TMUs) for the high-dimension matrix operations involved in image texture mapping.
3509Fact 5: GPGPUs can be explained simply with Pokémon!
3510
3511GPU architectures can seem difficult to understand at first, but trust me… they are not!
3512
3513Here is my gift to you: a Pokédex to help you understand GPUs in simple terms.
3514Here’s how you use it…
3515
3516You basically have four families of cards:
3517The Micro-Architecture Family
3518
3519This family will already be known to many of you. We are, of course, talking about Fermi, Maxwell, Kepler, Volta etc.
3520
3521
3522The Architecture Family
3523
3524This is the center, where the magic happens: orchestration, cache, workload scheduling… It’s the brain of the GPU.
3525
3526
3527The Multi-Core Units (aka CUDA Cores) Family
3528
3529This represents the physical core, where the maths computations actually happen.
3530
3531
3532
3533
3534
3535
3536The Programming Model Family
3537
3538The different layers of the programming model are used to abstract the GPU’s parallel computation for a programmer. It also makes the code portable to any GPU architecture.
3539
3540
3541
3542How to play
3543
3544 Start by choosing a card from the Micro-Architecture family
3545 Look at the components, and choose the appropriate card from the Architecture family
3546 Look at the components within the Micro-Architecture family and pick them from the Multi-Core Units family, then place them under the Architecture card
3547 Now, if you want to know how to program a GPU, place the Programming Model – Multi-Core Units special card on top of the Multi-Core Units cards
3548 Finally, on top of the Programming Model – Multi-Core Units special card, place all the Programming Model cards near the SM
3549 You then should have something that look like this:
3550
3551Examples of card configurations:
3552Fermi
3553Kepler
3554Maxwell
3555Pascal
3556Volta
3557Turing
3558
3559
3560After playing around with different Micro-Architectures, Architectures and Multi-Core Units for a bit, you should see that GPUs are just as simple as Pokémon!
3561
3562Enjoy the attached PDF, which will allow you to print your own GPU Pokédex. You can download it here: GPU Cards Game
3563<|endoftext|>
3564<|startoftext|>
3565
3566How to monitor your Kubernetes Cluster with OVH Observability
3567By Adrien Carreira / 2019-03-08 / Beamium, Kubernetes, Metrics, Noderig, Observability, OVH Managed Kubernetes, OVH Observability
3568
3569Our colleagues in the K8S team launched the OVH Managed Kubernetes solution last week, in which they manage the Kubernetes master components and spawn your nodes on top of our Public Cloud solution. I will not describe the details of how it works here, but there are already many blog posts about it (here and here, to get you started).
3570
3571In the Prescience team, we have used Kubernetes for more than a year now. Our cluster includes 40 nodes, running on top of PCI. We continuously run about 800 pods, and generate a lot of metrics as a result.
3572
3573Today, we’ll look at how we handle these metrics to monitor our Kubernetes Cluster, and (equally importantly!) how to do this with your own cluster.
3574OVH Metrics
3575
3576Like any infrastructure, you need to monitor your Kubernetes Cluster. You need to know exactly how your nodes, cluster and applications behave once they have been deployed in order to provide reliable services to your customers. To do this with our own Cluster, we use OVH Observability.
3577
3578OVH Observability is backend-agnostic, so we can push metrics with one format and query with another one. It can handle:
3579
3580 Graphite
3581 InfluxDB
3582 Metrics2.0
3583 OpentTSDB
3584 Prometheus
3585 Warp10
3586
3587It also incorporates a managed Grafana, in order to display metrics and create monitoring dashboards.
3588Monitoring Nodes
3589
3590The first thing to monitor is the health of nodes. Everything else starts from there.
3591
3592In order to monitor your nodes, we will use Noderig and Beamium, as described here. We will also use Kubernetes DaemonSets to start the process on all our nodes.
3593
3594So let’s start creating a namespace…
3595
3596kubectl create namespace metrics
3597
3598Next, create a secret with the write token metrics, which you can find in the OVH Control Panel.
3599
3600kubectl create secret generic w10-credentials --from-literal=METRICS_TOKEN=your-token -n metrics
3601
3602Copy metrics.yml into a file and apply the configuration with kubectl
3603
3604metrics.yml
3605# This will configure Beamium to scrap noderig
3606# And push metrics to warp 10
3607# We also add the HOSTNAME to the labels of the metrics pushed
3608---
3609apiVersion: v1
3610kind: ConfigMap
3611metadata:
3612 name: beamium-config
3613 namespace: metrics
3614data:
3615 config.yaml: |
3616 scrapers:
3617 nodering:
3618 url: http://0.0.0.0:9100/metrics
3619 period: 30000
3620 format: sensision
3621 labels:
3622 app: nodering
3623
3624 sinks:
3625 warp:
3626 url: https://warp10.gra1.metrics.ovh.net/api/v0/update
3627 token: $METRICS_TOKEN
3628
3629 labels:
3630 host: $HOSTNAME
3631---
3632# This is a custom collector that report the uptime of the node
3633apiVersion: v1
3634kind: ConfigMap
3635metadata:
3636 name: noderig-collector
3637 namespace: metrics
3638data:
3639 uptime.sh: |
3640 #!/bin/sh
3641 echo 'os.uptime' `date +%s%N | cut -b1-10` `awk '{print $1}' /proc/uptime`
3642---
3643kind: DaemonSet
3644apiVersion: apps/v1
3645metadata:
3646 name: metrics-daemon
3647 namespace: metrics
3648spec:
3649 selector:
3650 matchLabels:
3651 name: metrics-daemon
3652 template:
3653 metadata:
3654 labels:
3655 name: metrics-daemon
3656 spec:
3657 terminationGracePeriodSeconds: 10
3658 hostNetwork: true
3659 volumes:
3660 - name: config
3661 configMap:
3662 name: beamium-config
3663 - name: noderig-collector
3664 configMap:
3665 name: noderig-collector
3666 defaultMode: 0777
3667 containers:
3668 - image: ovhcom/beamium:latest
3669 imagePullPolicy: Always
3670 name: beamium
3671 env:
3672 - name: HOSTNAME
3673 valueFrom:
3674 fieldRef:
3675 fieldPath: spec.nodeName
3676 - name: TEMPLATE_CONFIG
3677 value: /config/config.yaml
3678 envFrom:
3679 - secretRef:
3680 name: w10-credentials
3681 optional: false
3682 resources:
3683 limits:
3684 cpu: "0.05"
3685 memory: 128Mi
3686 requests:
3687 cpu: "0.01"
3688 memory: 128Mi
3689 volumeMounts:
3690 - mountPath: /config
3691 name: config
3692 - image: ovhcom/noderig:latest
3693 imagePullPolicy: Always
3694 name: noderig
3695 args: ["-c", "/collectors", "--net", "3"]
3696 volumeMounts:
3697 - mountPath: /collectors/60/uptime.sh
3698 name: noderig-collector
3699 subPath: uptime.sh
3700 resources:
3701 limits:
3702 cpu: "0.05"
3703 memory: 128Mi
3704 requests:
3705 cpu: "0.01"
3706 memory: 128Mi
3707
3708Don’t hesitate to change the collector levels if you need more information.
3709
3710Then apply the configuration with kubectl…
3711
3712$ kubectl apply -f metrics.yml
3713# Then, just wait a minutes for the pods to start
3714$ kubectl get all -n metrics
3715NAME READY STATUS RESTARTS AGE
3716pod/metrics-daemon-2j6jh 2/2 Running 0 5m15s
3717pod/metrics-daemon-t6frh 2/2 Running 0 5m14s
3718
3719NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE AGE
3720daemonset.apps/metrics-daemon 40 40 40 40 40 122d
3721
3722You can import our dashboard in to your Grafana from here, and view some metrics about your nodes straight away.
3723Kube Metrics
3724
3725As the OVH Kube is a managed service, you don’t need to monitor the apiserver, etcd, or controlplane. The OVH Kubernetes team takes care of this. So we will focus on cAdvisor metrics and Kube state metrics
3726
3727The most mature solution for dynamically scraping metrics inside the Kube (for now) is Prometheus.
3728Kube metrics
3729
3730In the next Beamium release, we should be able to reproduce the features of the Prometheus scraper.
3731
3732To install the Prometheus server, you need to install Helm on the cluster…
3733
3734kubectl -n kube-system create serviceaccount tiller
3735kubectl create clusterrolebinding tiller \
3736 --clusterrole cluster-admin \
3737 --serviceaccount=kube-system:tiller
3738helm init --service-account tiller
3739
3740You then need to create the following two files: prometheus.yml and values.yml.
3741
3742prometheus.yml
3743# Based on https://github.com/prometheus/prometheus/blob/release-2.2/documentation/examples/prometheus-kubernetes.yml
3744serverFiles:
3745 prometheus.yml:
3746 remote_write:
3747 - url: "https://prometheus.gra1.metrics.ovh.net/remote_write"
3748 remote_timeout: 120s
3749 bearer_token: $TOKEN
3750 write_relabel_configs:
3751 # Filter metrics to keep
3752 - action: keep
3753 source_labels: [__name__]
3754 regex: "eagle.*|\
3755 kube_node_info.*|\
3756 kube_node_spec_taint.*|\
3757 container_start_time_seconds|\
3758 container_last_seen|\
3759 container_cpu_usage_seconds_total|\
3760 container_fs_io_time_seconds_total|\
3761 container_fs_write_seconds_total|\
3762 container_fs_usage_bytes|\
3763 container_fs_limit_bytes|\
3764 container_memory_working_set_bytes|\
3765 container_memory_rss|\
3766 container_memory_usage_bytes|\
3767 container_network_receive_bytes_total|\
3768 container_network_transmit_bytes_total|\
3769 machine_memory_bytes|\
3770 machine_cpu_cores"
3771
3772 scrape_configs:
3773 # Scrape config for Kubelet cAdvisor.
3774 - job_name: 'kubernetes-cadvisor'
3775 scheme: https
3776 tls_config:
3777 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
3778 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
3779 kubernetes_sd_configs:
3780 - role: node
3781
3782 relabel_configs:
3783 - target_label: __address__
3784 replacement: kubernetes.default.svc:443
3785 - source_labels: [__meta_kubernetes_node_name]
3786 regex: (.+)
3787 target_label: __metrics_path__
3788 replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
3789
3790 metric_relabel_configs:
3791 # Only keep systemd important services like docker|containerd|kubelet and kubepods,
3792 # We also want machine_cpu_cores that don't have id, so we need to add the name of the metric in order to be matched
3793 # The string will concat id with name and the separator is a ;
3794 # `/;container_cpu_usage_seconds_total` OK
3795 # `/system.slice;container_cpu_usage_seconds_total` OK
3796 # `/system.slice/minion.service;container_cpu_usage_seconds_total` NOK, Useless
3797 # `/kubepods/besteffort/e2514ad43202;container_cpu_usage_seconds_total` Best Effort POD OK
3798 # `/kubepods/burstable/e2514ad43202;container_cpu_usage_seconds_total` Burstable POD OK
3799 # `/kubepods/e2514ad43202;container_cpu_usage_seconds_total` Guaranteed POD OK
3800 # `/docker/pod104329ff;container_cpu_usage_seconds_total` OK, Container that run on docker but not managed by kube
3801 # `;machine_cpu_cores` OK, there is no id on these metrics, but we want to keep them also
3802 - source_labels: [id,__name__]
3803 regex: "^((/(system.slice(/(docker|containerd|kubelet).service)?|(kubepods|docker).*)?);.*|;(machine_cpu_cores|machine_memory_bytes))$"
3804 action: keep
3805 # Remove Useless parents keys like `/kubepods/burstable` or `/docker`
3806 - source_labels: [id]
3807 regex: "(/kubepods/burstable|/kubepods/besteffort|/kubepods|/docker)"
3808 action: drop
3809 # cAdvisor give metrics per container and sometimes it sum up per pod
3810 # As we already have the child, we will sum up ourselves, so we drop metrics for the POD and keep containers metrics
3811 # Metrics for the POD don't have container_name, so we drop if we have just the pod_name
3812 - source_labels: [container_name,pod_name]
3813 regex: ";(.+)"
3814 action: drop
3815
3816 # Scrape config for service endpoints.
3817 - job_name: 'kubernetes-service-endpoints'
3818 kubernetes_sd_configs:
3819 - role: endpoints
3820
3821 relabel_configs:
3822 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
3823 action: keep
3824 regex: true
3825 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
3826 action: replace
3827 target_label: __scheme__
3828 regex: (https?)
3829 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
3830 action: replace
3831 target_label: __metrics_path__
3832 regex: (.+)
3833 - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
3834 action: replace
3835 target_label: __address__
3836 regex: ([^:]+)(?::\d+)?;(\d+)
3837 replacement: $1:$2
3838 - action: labelmap
3839 regex: __meta_kubernetes_service_label_(.+)
3840 - source_labels: [__meta_kubernetes_namespace]
3841 action: replace
3842 target_label: namespace
3843 - source_labels: [__meta_kubernetes_service_name]
3844 action: replace
3845 target_label: kubernetes_name
3846
3847 # Example scrape config for pods
3848 #
3849 # The relabeling allows the actual pod scrape endpoint to be configured via the
3850 # following annotations:
3851 #
3852 # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
3853 # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
3854 # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
3855 # pod's declared ports (default is a port-free target if none are declared).
3856 - job_name: 'kubernetes-pods'
3857 kubernetes_sd_configs:
3858 - role: pod
3859
3860 relabel_configs:
3861 - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
3862 action: keep
3863 regex: true
3864 - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
3865 action: replace
3866 target_label: __metrics_path__
3867 regex: (.+)
3868 - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
3869 action: replace
3870 regex: ([^:]+)(?::\d+)?;(\d+)
3871 replacement: $1:$2
3872 target_label: __address__
3873 - action: labelmap
3874 regex: __meta_kubernetes_pod_label_(.+)
3875 - source_labels: [__meta_kubernetes_namespace]
3876 action: replace
3877 target_label: namespace
3878 - source_labels: [__meta_kubernetes_pod_name]
3879 action: replace
3880 target_label: pod_name
3881 - source_labels: [__meta_kubernetes_pod_node_name]
3882 action: replace
3883 target_label: host
3884 - action: labeldrop
3885 regex: (pod_template_generation|job|release|controller_revision_hash|workload_user_cattle_io_workloadselector|pod_template_hash)
3886
3887values.yml
3888alertmanager:
3889 enabled: false
3890pushgateway:
3891 enabled: false
3892nodeExporter:
3893 enabled: false
3894server:
3895 ingress:
3896 enabled: true
3897 annotations:
3898 kubernetes.io/ingress.class: traefik
3899 ingress.kubernetes.io/auth-type: basic
3900 ingress.kubernetes.io/auth-secret: basic-auth
3901 hosts:
3902 - prometheus.domain.com
3903 image:
3904 tag: v2.7.1
3905 persistentVolume:
3906 enabled: false
3907
3908Don’t forget to replace your token!
3909
3910The Prometheus scraper is quite powerful. You can relabel your time series, keep a few that match your regex, etc. This config removes a lot of useless metrics, so don’t hesitate to tweak it if you want to see more cAdvisor metrics (for example).
3911
3912 Install it with Helm…
3913
3914helm install stable/prometheus \
3915 --namespace=metrics \
3916 --name=metrics \
3917 --values=values/values.yaml \
3918 --values=values/prometheus.yaml
3919
3920Add add a basic-auth secret…
3921
3922$ htpasswd -c auth foo
3923New password: <bar>
3924New password:
3925Re-type new password:
3926Adding password for user foo
3927$ kubectl create secret generic basic-auth --from-file=auth -n metrics
3928secret "basic-auth" created
3929
3930You can can access the Prometheus server interface through prometheus.domain.com.
3931
3932You will see all the metrics for your Cluster, although only the one you have filtered will be pushed to OVH Metrics.
3933
3934The Prometheus interfaces is a good way to explore your metrics, as it’s quite straightforward to display and monitor your infrastructure. You can find our dashboard here.
3935Resources Metrics
3936
3937As @Martin Schneppenheim said in this post, in order to correctly manage a Kubernetes Cluster, you also need to monitor pod resources.
3938
3939We will install Kube Eagle, which will fetch and expose some metrics about CPU and RAM requests and limits, so they can be fetched by the Prometheus server you just installed.
3940Kube Eagle
3941
3942Create a file named eagle.yml.
3943
3944eagle.yml
3945apiVersion: rbac.authorization.k8s.io/v1beta1
3946kind: ClusterRole
3947metadata:
3948 labels:
3949 app: kube-eagle
3950 name: kube-eagle
3951 namespace: kube-eagle
3952rules:
3953- apiGroups:
3954 - ""
3955 resources:
3956 - nodes
3957 - pods
3958 verbs:
3959 - get
3960 - list
3961- apiGroups:
3962 - metrics.k8s.io
3963 resources:
3964 - pods
3965 - nodes
3966 verbs:
3967 - get
3968 - list
3969---
3970apiVersion: rbac.authorization.k8s.io/v1beta1
3971kind: ClusterRoleBinding
3972metadata:
3973 labels:
3974 app: kube-eagle
3975 name: kube-eagle
3976 namespace: kube-eagle
3977subjects:
3978- kind: ServiceAccount
3979 name: kube-eagle
3980 namespace: kube-eagle
3981roleRef:
3982 apiGroup: rbac.authorization.k8s.io
3983 kind: ClusterRole
3984 name: kube-eagle
3985---
3986apiVersion: v1
3987kind: ServiceAccount
3988metadata:
3989 namespace: kube-eagle
3990 labels:
3991 app: kube-eagle
3992 name: kube-eagle
3993---
3994apiVersion: apps/v1
3995kind: Deployment
3996metadata:
3997 namespace: kube-eagle
3998 name: kube-eagle
3999 labels:
4000 app: kube-eagle
4001spec:
4002 replicas: 1
4003 selector:
4004 matchLabels:
4005 app: kube-eagle
4006 template:
4007 metadata:
4008 annotations:
4009 prometheus.io/scrape: "true"
4010 prometheus.io/port: "8080"
4011 prometheus.io/path: "/metrics"
4012 labels:
4013 app: kube-eagle
4014 spec:
4015 serviceAccountName: kube-eagle
4016 containers:
4017 - name: kube-eagle
4018 image: "quay.io/google-cloud-tools/kube-eagle:1.0.0"
4019 imagePullPolicy: IfNotPresent
4020 env:
4021 - name: PORT
4022 value: "8080"
4023 ports:
4024 - name: http
4025 containerPort: 8080
4026 protocol: TCP
4027 livenessProbe:
4028 httpGet:
4029 path: /health
4030 port: http
4031 readinessProbe:
4032 httpGet:
4033 path: /health
4034 port: http
4035
4036$ kubectl create namespace kube-eagle
4037$ kubectl apply -f eagle.yml
4038
4039Next, add import this Grafana dashboard (it’s the same dashboard as Kube Eagle, but ported to Warp10).
4040
4041You now have an easy way of monitoring your pod resources in the Cluster!
4042Custom Metrics
4043
4044How does Prometheus know that it needs to scrape kube-eagle? If you looks at the metadata of the eagle.yml, you’ll see that:
4045
4046annotations:
4047 prometheus.io/scrape: "true"
4048 prometheus.io/port: "8080" # The port where to find the metrics
4049 prometheus.io/path: "/metrics" # The path where to find the metrics
4050
4051Theses annotations will trigger the Prometheus auto-discovery process (described in prometheus.yml line 114).
4052
4053This means you can easily add these annotations to pods or services that contain a Prometheus exporter, and then forward these metrics to OVH Observability. You can find a non-exhaustive list of Prometheus exporters here.
4054Volumetrics Analysis
4055
4056As you saw on the prometheus.yml , we’ve tried to filter a lot of useless metrics. For example, with cAdvisor on a fresh cluster, with only three real production pods, and with the whole kube-system and Prometheus namespace, have about 2,600 metrics per node. With a smart cleaning approach, you can reduce this to 126 series.
4057
4058Here’s a table to show the approximate number of metrics you will generate, based on the number of nodes (N) and the number of production pods (P) you have:
4059 Noderig cAdvisor Kube State Eagle Total
4060nodes N * 13(1) N * 2(2) N * 1(3) N * 8(4) N * 24
4061system.slice 0 N * 5(5) * 16(6) 0 0 N * 80
4062kube-system + kube-proxy + metrics 0 N * 5(9) * 26(6) 0 N * 5(9) * 6(10) N * 160
4063Production Pods 0 P * 26(6) 0 P * 6(10) P * 32
4064
4065For example, if you run three nodes with 60 Pods, you will generate 264 * 3 + 32 * 60 ~= 2,700 metrics
4066
4067NB: A pod has a unique name, so if you redeploy a deployment, you will create 32 new metrics each time.
4068
4069(1) Noderig metrics: os.mem / os.cpu / os.disk.fs / os.load1 / os.net.dropped (in/out) / os.net.errs (in/out) / os.net.packets (in/out) / os.net.bytes (in/out)/ os.uptime
4070
4071(2) cAdvisor nodes metrics: machine_memory_bytes / machine_cpu_cores
4072
4073(3) Kube state nodes metrics: kube_node_info
4074
4075(4) Kube Eagle nodes metrics: eagle_node_resource_allocatable_cpu_cores / eagle_node_resource_allocatable_memory_bytes / eagle_node_resource_limits_cpu_cores / eagle_node_resource_limits_memory_bytes / eagle_node_resource_requests_cpu_cores / eagle_node_resource_requests_memory_bytes / eagle_node_resource_usage_cpu_cores / eagle_node_resource_usage_memory_bytes
4076
4077(5) With our filters, we will monitor around five system.slices
4078
4079(6) Metrics are reported per container. A pod is a set of containers (a minimum of two): your container + the pause container for the network. So we can consider (2* 10 + 6) for the number of metrics per pod. 10 metrics from the cAdvisor and six for the network (see below) and for system.slice we will have 10 + 6, because it’s treated as one container.
4080
4081(7) cAdvisor will provide these metrics for each container: container_start_time_seconds / container_last_seen / container_cpu_usage_seconds_total / container_fs_io_time_seconds_total / container_fs_write_seconds_total / container_fs_usage_bytes / container_fs_limit_bytes / container_memory_working_set_bytes / container_memory_rss / container_memory_usage_bytes
4082
4083(8) cAdvisor will provide these metrics for each interface: container_network_receive_bytes_total * per interface / container_network_transmit_bytes_total * per interface
4084
4085(9) kube-dns / beamium-noderig-metrics / kube-proxy / canal / metrics-server
4086
4087(10) Kube Eagle pods metrics: eagle_pod_container_resource_limits_cpu_cores / eagle_pod_container_resource_limits_memory_bytes / eagle_pod_container_resource_requests_cpu_cores / eagle_pod_container_resource_requests_memory_bytes / eagle_pod_container_resource_usage_cpu_cores / eagle_pod_container_resource_usage_memory_bytes
4088Conclusion
4089
4090As you can see, monitoring your Kubernetes Cluster with OVH Observability is easy. You don’t need to worry about how and where to store your metrics, leaving you free to focus on leveraging your Kubernetes Cluster to handle your business workloads effectively, like we have in the Machine Learning Services Team.
4091
4092The next step will be to add an alerting system, to notify you when your nodes are down (for example). For this, you can use the free OVH Alert Monitoring tool.
4093Stay in touch
4094
4095For any questions, feel free to join the Observability Gitter or Kubernetes Gitter!
4096Follow us on Twitter: @OVH
4097<|endoftext|>
4098<|startoftext|>
4099
4100Monitoring guidelines for OVH Observability
4101By Kevin Georges / 2019-03-07 / Beamium, Metrics, Noderig, Observability, OVH Observability
4102
4103At the OVH Observability (formerly Metrics) team, we collect, process and analyse most of OVH’s monitoring data. It represents about 500M unique metrics, pushing data points at a steady rate of 5M per second.
4104
4105This data can be classified in two ways: host or application monitoring. Host monitoring is mostly based on hardware counters (CPU, memory, network, disk…) while application monitoring is based on the service and its scalability (requests, processing, business logic…).
4106
4107We provide this service for internal teams, who enjoy the same experience as our customers. Basically, our Observability service is SaaS with a compatibility layer (supporting InfluxDB, OpenTSDB, Warp10, Prometheus, and Graphite) that allows it to integrate with most of the existing solutions out there. This way, a team that is used to a particular tool, or have already deployed a monitoring solution, won’t need to invest much time or effort when migrating to a fully managed and scalable service: they just pick a token, use the right endpoint, and they’re done. Besides, our compatibility layer offers a choice: you can push your data with OpenTSDB, then query it in either PromQL or WarpScript. Combining protocols in this way results in a unique open-source interoperability that delivers more value, with no restrictions created by a solution’s query capabilities.
4108Scollector, Snap, Telegraf, Graphite, Collectd…
4109
4110Drawing on this experience, we collectively tried most of the collection tools, but we always arrived at the same conclusion: we were witnessing metrics bleeding. Each tool focused on scraping every reachable bit of data, which is great if you are a graph addict, but can be counterproductive from an operational point-of-view, if you have to monitor thousands of hosts. While it’s possible to filter them, teams still need to understand the whole metrics set in order to know what needs to be filtered.
4111
4112At OVH, we use laser-cut collections of metrics. Each host has a specific template (web server, database, automation…) that exports a set amount of metrics, which can be used for health diagnostics and monitoring application performance.
4113
4114This finely-grained management leads to greater understanding for operational teams, since they know what’s available and can progressively add metrics to manage their own services.
4115Beamium & Noderig — The Perfect Fit
4116
4117Our requirements were rather simple:
4118— Scalable: Monitor one node in the same way as we’d monitor thousands
4119— Laser-cut: Only collect the metrics that are relevant
4120— Reliable: We want metrics to be available even in the worst conditions
4121— Simple: Multiple plug-and-play components, instead of intricate ones
4122— Efficient: We believe in impact-free metrics collection
4123The first solution was Beamium
4124
4125Beamium handles two aspects of the monitoring process: application data scrapping and metrics forwarding.
4126
4127Application data is collected is the well-known and widely-used Prometheus format. We chose Prometheus as the community was growing rapidly at the time, and many instrumentation libraries were available for it. There are two key concepts in Beamium: Sources and Sinks.
4128
4129The Sources, where Beamium will scrape data, are just Prometheus HTTP endpoints. This means it’s as simple as supplying the HTTP endpoint, and eventually adding a few parameters. This data will be routed to Sinks, which allows us to filter them during the routing process between a Source and a Sink. Sinks are Warp 10(R) endpoints, where we can push the data.
4130
4131Once scraped, metrics are first stored on disk, before being routed to a Sink. The Disk Fail-Over (DFO) mechanism allows for network or remote failure recovery . This way, locally we retain the Prometheus pull logic, but decentralized, and we reverse it to push to feed the platform which has many advantages:
4132
4133 support for a transactional logic over the metrics platform
4134 recovery from network partitioning or platform unavailability
4135 dual writes with data consistency (as there’s otherwise no guarantee that two Prometheus instances would scrape the same data at the same timestamp)
4136
4137We have many different customers, some of whom use the Time Series store behind the Observability product to manage their product consumption or transactional changes over licensing. These use cases can’t be handled with Prometheus instances, which are better suited to metrics-based monitoring.
4138From Prometheus to OVH Observability with Beamium
4139The second was Noderig
4140
4141During conversations with some of our customers, we came to the conclusion that the existing tools needed a certain level of expertise if they were to be used at scale. For example, a team with a 20k node cluster with Scollector would end up with more than 10 million metrics, just for the nodes… In fact, depending on the hardware configuration, Scollector would generate between 350 and 1,000 metrics from a single node.
4142
4143That’s the reason behind Noderig. We wanted it to be as simple to use as the node-exporter from Prometheus, but with more finely-grained metrics production as the default.
4144
4145Noderig collects OS metrics (CPU, memory, disk, and network) using a simple level semantic. This allows you to collect the right amount of metrics for any kind of host, which is particularly suitable for containerized environments.
4146
4147We made it compatible with Scollector’s custom collectors to ease the migration process, and allow for extensibility. External collectors are simple executables that act as providers for data that is collected by Noderig, as with any other metrics.
4148
4149The collected metrics are available through a simple rest endpoint, allowing you to see your metrics in real-time, and easily integrate them with Beamium.
4150Noderig and Beamium
4151Does it work?
4152
4153Beamium and Noderig are extensively used at OVH, and support the monitoring of very large infrastructures. At the time of writing, we collect and store hundreds of millions of metrics using these tools. So they certainly seem to work!
4154
4155In fact, we’re currently working on the 2.0 release, which will be a rework, incorporating autodiscovery and hot reload.
4156Stay in touch
4157
4158For any questions, feel free to join our Gitter!
4159Follow us on Twitter: @OVH
4160<|endoftext|>
4161<|startoftext|>
4162
4163Continuous Delivery and Deployment Workflows with CDS
4164By Yvonnick Esnault / 2019-03-01 / Automation, CDS, Devops, Industrialization
4165
4166The CDS Workflow is a key feature of the OVH CI/CD Platform. This structural choice adds an additional concept above CI/CD pipelines and jobs, and after more than three years of intensive use, is definitely an essential feature.
4167Continuous Delivery and Deployment Workflows with CDS
4168
4169Before delving into a full explanation of CDS workflows, let’s review some of the key concepts behind pipelines and jobs. Those concepts are drawn from the reference book, 8 Principles of Continuous Delivery
4170The basic element: “The job”
4171
4172A job is composed of steps, which will be run sequentially. A job is executed in a dedicated workspace (i.e. filesystem). A new workspace is assigned for each new run of a job.
4173CDS Job
4174
4175A standard build job looks like this:
4176CDS build job
4177
4178You can use “built-in” actions, such as checkoutApplication, script, jUnit, artifact upload/download.
4179
4180 The checkoutApplication action clones your Git repository
4181 The Script action executes your build command as “make build”
4182 The artifactUpload action uploads previously-built binaries
4183 The jUnit action parses a given Junit-formatted XML file to extract its test results
4184
4185A pipeline: How to orchestrate your jobs with stages
4186
4187With CDS, a pipeline is not a job flow. A pipeline is a sequence of stages, each of which contains one or more jobs.
4188CDS Pipeline
4189
4190A Stage is a set of jobs that will be run in parallel. Stages are executed sequentially, if the previous stage is successful.
4191
4192Let’s take a real-life use case: the pipeline that built CDS. This pipeline has four stages:
4193
4194 The “Build Minimal” stage is launched for all Git branches. The main goal of this stage is to compile the Linux version of CDS binaries.
4195 The “Build other os/arch” stage is only launched on the master branch. This stage compiles all binaries supported by the os/arch: linux, openbsd, freebsd, darwin, windows – 386, amd64 and arm.
4196 The “Package” stage is launched for all Git branches. This stage prepares the docker image and Debian package.
4197 Finally, the “Publish” stage is launched, whatever the Git branch.
4198
4199Most tasks are executed in parallel, whenever possible. This results in very fast feedback, so we will quickly know if the compilation is OK or not.
4200CDS Workflows: How to orchestrate your pipelines
4201
4202The workflow concept is a key feature, and widely considered a native, manageable and feature-rich entity in CDS. A CDS Workflow allows you to chain pipelines with manual or automatic gates, using conditional branching. A workflow can be stored as code, designed on CDS UI, or both, depending on what best suits you.
4203
4204Let’s take an example. One workflow for building and deploying three micro-services:
4205
4206 Build each micro-service
4207 Deploy them in preproduction
4208 Run integration tests on preproduction environment
4209 Deploy them in production, then re-run integration tests in production
4210
4211For the building part, there is only one pipeline to manage, which is used three times in the workflow with a different application/environment context each time. This is called the “pipeline context”.
4212
4213Any conditional branching against the workflow (e.g.“automatic deployment on the staging environment, only if the current Git branch is master”) can be executed through a “run conditional” set on the pipeline.
4214
4215Let’s look at a a real use case. This is the workflow that builds, tests and deploys CDS in production at OVH (yes, CDS builds and deploys itself!):
4216
4217 For each Git commit, the workflow is triggered
4218 The UI is packaged, all binaries are prepared, and the docker images are built. The “UT” job launches the unit tests. The job “IT” job installs CDS in an ephemeral environment and launches the integration tests on it. Part 2 is automatically triggered on all Git commits.
4219 Part 3 deploys CDS on our preproduction environment, then launches the integration tests on it. It is started automatically when the current branch is the master branch.
4220 Last but not least, part 4 deploys CDS on our production environment.
4221
4222If there is a failure on a pipeline, it may look like this:
4223
4224The same kind of workflow is used for building and deploying the Prescience Project (https://labs.ovh.com/machine-learning-platform):
4225
4226But of course, you’re not limited to the most complex tasks with CDS Workflows! These two examples demonstrate the fact that workflows allow to build and deploy a coherent set of micro-services. If you have simpler needs, your workflows are, of course, simpler.
4227
4228Pipeline reusability allows you to easily maintain the technical parts of the build, tests and deployments, even if you have hundreds of applications. If hundreds of applications share the same kind of workflows, you can leverage the maintainability of the workflow templates. We will talk more about this in a future post.
4229Much more than “Pipeline as Code”… “Workflow as Code”
4230
4231There is no compromise with CDS. Some users prefer to draw the workflows by the web UI, others prefer to write yaml code. CDS lets you do both!
4232
4233There are two ways to store workflows: either in the CDS database or on your Git repository with your source code. We call this “Workflow as Code”.
4234
4235This makes it possible to have a workflow on a given branch, and then develop it on another branch. CDS will instantiate the workflow on the fly, based on the yaml code present on the current branch.
4236
4237CDS is OVH open-source software, and can be found on https://github.com/ovh/cds, with documentation on https://ovh.github.io/cds.
4238CDS
4239
4240Previous Posts:
4241
4242 CDS Introduction: https://www.ovh.com/fr/blog/how-does-ovh-manage-the-ci-cd-at-scale/
4243 DataBuzzWord Podcast (French): https://www.ovh.com/fr/blog/understanding-ci-cd-for-big-data-and-machine-learning/
4244<|endoftext|>
4245<|startoftext|>
4246
4247Federate your Private Cloud with your Active Directory
4248By Erwan Quelin / 2019-03-01 / Active Directory, Federation, Microsoft, PCC, Private Cloud, vCenter, VMware
4249
4250Federation is a beta feature offered to all OVH Private Cloud customers with vCenter 6.5. If you want to be part of the beta, please contact our support team. It allows you to use an external Microsoft Active Directory as the authentication source to access to the VMware vCenter server. The implementation of this feature has been made possible thanks to OVH’s DevOps team, who have developed an innovative and unique API that adds additional features to those offered by VMware. Indeed, it is not possible to configure identities’ sources via vCenter’s native API at the moment.
4251Federate your Private Cloud with your Active Directory
4252
4253In this post, we will take a look at how to activate Federation on your Private Cloud solution, and the advantages of doing so.
4254What for?
4255
4256By default, access rights to a vCenter in the Private Cloud are managed directly by that vCenter. Users are created locally (localos or SSO domain) and all access-based control mechanisms (RBACs) are managed by the SSO service. Enabling Federation will delegate user management to a Microsoft Active Directory (AD). As a result, the vCenter server will interact with the domain controller to ensure that the user trying to connect is who they claim to be. The vCenter retains role and privilege management for the objects it manages. Once Federation is configured, it is possible to associate the AD’s users with vCenter roles, so that they can access and/or manage certain objects within the infrastructure (virtual machines, networks, folders, etc.).
4257
4258One of the main applications of this will be to facilitate vCenter access for administrators by reducing the number of accounts needed to maintain the various elements of the infrastructure. In addition, it will be possible to extend and unify the password management policy between the Active Directory and the vCenter Private Cloud.
4259
4260The fact that Federation is manageable via OVH’s API makes it possible to automate the configuration, and also to ensure it is kept in operational condition. Finally, it is very simple to add checks in any monitoring tool (Nagios, Zabbix, Sensu, etc.) to monitor the state of the Federation and the rights assigned to users.
4261
4262Here is an example of a simple PowerShell script that will periodically check if the Federation configuration is in the desired state:
4263Architecture and prerequisites
4264
4265Since the vCenter will have to communicate with the domain controllers, the first step will be to allow flows between these elements. There are several ways to achieve this goal, such as combining OVHCloud Connect with a Private Gateway. A study of all the different possibilities would require a whole article, so we advise you to contact OVH or one of our partners to assist you in choosing the most suitable architecture. The following diagram gives you a simplified overview of what this might look like:
4266FEDERATION - Logical diagramFEDERATION – Logical diagram
4267
4268
4269Once interconnected, you will need to make sure you have gathered the following information before starting the configuration process:
4270
4271 Your OVH credentials (nic and password)
4272 The name of your Private Cloud (in the form pcc-X-X-X-X)
4273 The required information about the Active Directory infrastructure, namely:
4274 Active Directory domain’s short and long name (for example, contoso and contoso.com)
4275 IP address of the domain controller
4276 AD account’s username and password, with sufficient rights to consult the directory
4277 Location of groups and users in the AD hierarchy as “base DN” (example: OU = Users, DC = contoso, DC = com). It should be noted that even though the group information is mandatory, it is not currently possible to use it to manage authentication
4278 List of Active Directory users that you want to bind to the vCenter. It will be necessary to provide user names in the form username@FQDN.domain (for example, federation@contoso.com)
4279
4280Please note that it’s not currently possible to have multiple users with the same shortname, regardless of whether they are managed locally or by an Active Directory.
4281Activation and configuration
4282
4283Once you have gathered all the necessary information, it will be possible to activate and configure Federation. The operation will take place in three steps:
4284
4285 Activation of the link between the Active Directory and the Private Cloud
4286 Binding of one or more AD users to the Private Cloud
4287 Assignment of rights to users
4288
4289The configuration is only accessible via OVH’s API for now, but it should be possible in the medium-term to do it via the OVH Control Panel. The API offers all the necessary options to activate, configure, or even delete your Private Cloud’s Federation:
4290Enabling the connection between the AD and the Private Cloud
4291
4292Go to the API’s explorer site and authenticate with your OVH credentials. If you do not already have it, get the name (also called serviceName in the API) of your Private Cloud, as it will be mandatory for all other steps of the configuration. You can access this information by performing a GET on the /dedicatedCloud URI:
4293Federation - GET dedicatedCloud
4294
4295Enable Federation by providing all the information about your Active Directory via a POST on the /dedicatedCloud/{serviceName}/federation/activeDirectory URI. All requested information is mandatory:
4296Federation - POST Add AD
4297
4298The activation of the Federation will take a while, and take place in the background. You can follow the progress of the operation via the OVH Control Panel:
4299
4300Once completed, you can retrieve the Federation ID by sending a GET request to the /dedicatedCloud/{serviceName}/federation/activeDirectory URI:
4301Federation - GET AD ID
4302
4303Binding one or more AD users
4304
4305Now that your AD is declared in the vCenter Private Cloud, we will be able to bind the Active Directory’s users to it. Note that even if your users are binded, they won’t have any vCenter roles associated with them, so they will not be able to log in.
4306
4307To bind an user, you will need to send a POST request to the /dedicatedCloud/{serviceName}/federation/activeDirectory/{activeDirectory}/grantActiveDirectoryUser URI by specifying the full username:
4308Federation - POST Add User
4309
4310Make sure that the user is present in the search OU you declared when associating your AD with the vCenter. Once again, you can validate that the import task is done via the API, or via the Control Panel:
4311
4312You should also receive an email stating that the user has been imported.
4313Assigning access rights
4314
4315The last step will be to give the users access rights to the different objects in the virtual infrastructure. This step does not differ from the usual way of managing Private Cloud’s users rights. It can be done through the API or the OVH Control Panel.
4316
4317You should now be able to log in to your vCenter with your AD’s users and start managing your Private Cloud!
4318
4319In this post, we have seen how to activate the Federation option, and the benefits it brings to users of the OVH Private Cloud. In a future post, we will talk about another new feature: Granular Rights. So stay tuned on the OVH blog!
4320<|endoftext|>
4321<|startoftext|>
4322
4323The Unexpected Quest for Business Intelligence
4324By Steven Le Roux / 2019-02-28 / Business Intelligence, Data Capture
4325
4326Business Intelligence (BI) is the ability to collect substantial data from an information system to feed a Data Warehouse (DWH) or data lake. They usually provide a copy of the data that will be used for BI applications. Different strategies can be applied to feed a DWH. One such strategy is Change Data Capture (CDC), which is the ability to capture changing states from a database, and convert them to events that can be used for other purposes. Most databases are intended for OLTP purposes, and are well designed for this. Nonetheless, different use cases would require the same data with different access patterns. These use cases (big data, ETL, and stream processing, to name a few) mostly fall under the OLAP banner. Mixing them would make the OLTP and production environment at risk, thus we need to enable OLAP in a non-intrusive way.
4327
4328OVH, as a cloud provider, manages numerous databases, both for its customers and its own needs. Managing a database lifecycle always involves both keeping the infrastructure up to date, and remaining in synch with the development release cycle, to align the software with its database dependency. For example, an app might require MySQL 5.0, which could then be announced as EOL (End Of Life). In this case the app needs to be modified to support (let’s say) MySQL 5.5. We’re not reinventing the wheel here – this process has been managed by operations and dev teams for decades now.
4329
4330This becomes trickier if you don’t have control over the application. For example, imagine a third party provides you with an application to ensure encrypted transactions. You have absolutely no control over this application, nor the associated database. Nonetheless, you still need the data from the database.
4331
4332This blog post relates a similar example we encountered while building the OVH data lake, with the help of an in-house CDC development. This story takes place in early 2015, although I still think it’s worth sharing. ?
4333Designing a non-intrusive Change Data Capture process
4334
4335It’s usually good practice to establish the state of the technology before jumping into dev, as it will save time and strengthen communities. Back in early 2015, when the CDC landscape was first emerging (Debezium, a similar open-source solution, only appeared at the end of the same year), the only existing solution – Databus – came from LinkedIn. The Databus architecture was rather complex, and the project wasn’t very active. Also, it didn’t solve our security requirements, and we come from a strong Ops culture, so running a JVM on the database server was clearly a no-go for our Operations teams.
4336
4337Although there was no CDC software matching our requirements, we eventually found a binlog replication library that we could integrate with the few of them in Go. Binlog is the MySQL name for the WAL database.
4338
4339Our requirements were rather simple:
4340
4341 Avoid JVM-based solutions (JVM and containers weren’t working well at the time, and it’d have been hard to get support from Ops)
4342 The CDC agent needed to connect to the CDC gateway for highly-secured environments (and not a gateway to agents)
4343 The CDC gateway could control its agents’ fleet
4344 The CDC agent could filter and serialise events to push them with back pressure control
4345 The CDC agent could dump the DB to get a first snapshot, since binlogs aren’t always available from the beginning
4346
4347Here is the global design of the ‘Menura’ project:
4348Menura agentMenura agent
4349
4350Menura is the genus of the lyrebird: a bird that can replicate any sound. Most BI-related components are ‘birds’, since they’re part of the Business Intelligence R&D project!
4351Automate the BI Control Plane
4352
4353As Menura was deployed on database servers, it could reflect available databases and tables in the BI Control Plane, so that a user could ask to sync with a given table. The controlling protocol had a few simple tasks:
4354
4355 Add and configure a database source
4356 Manage remote configuration
4357 Map agent/tables cartography
4358 Dump database tables
4359 Manage CDC (start/stop sync, commit binlog offsets…)
4360
4361gRPC was only just emerging at the time, but we saw in this project, with its strong foundations, an opportunity to reconcile Protobuf, Thrift, and language diversity. Furthermore, the ability to set up a bidirectional streaming RPC was interesting from the point of view of implementing client-to-server connections with server-to-client RPCs, so we made it the foundation of the controlling protocol.
4362
4363gRPC uses Protocol Buffer as IDL to serialise structured data. Each StreamRequest is composed of a Header to manage multi-tenancy. This means that if our customers decided to name their sources with the same name, we could isolate control messages by tenant, not just by source. We therefore find a RequestType, as defined in Protobuf v3:
4364
4365enum RequestType {
4366 Control_Config = 0;
4367 Control_Hearbeat = 1;
4368 Control_MySQLClient = 10;
4369 Control_MySQLBinlog = 11;
4370 Control_Syslog = 12;
4371 Control_PgSQLClient = 13;
4372 Control_PgSQLWal = 14;
4373 Control_PgSQLLogDec = 15;
4374 Control_Kafka = 16;
4375 Control_MSSQLClient = 17;
4376}
4377
4378This RequestType allowed us to reach source plugins with the specialised structures they expect. Note that we decoupled DB Clients from DB Replication (binlog, WAL…). The main reason is that they don’t share the same scope, so the libraries aren’t the same. It therefore made sense to keep them separate.
4379
4380Another reason is that replication acts as a slave for the database, meaning there is no significant footprint over a replication process, while a client dumping the DB could imply locking, rows or tables, given the database and its undergoing engine. This could have led to us having two different slaves, or the replication plugged into a master and the client plugged into a slave.
4381
4382These concerns drove us towards a modular design for the Menura agent:
4383Menura agentMenura agent
4384Filtering data
4385
4386An important feature was the ability to filter events or columns. There were two reasons for this:
4387
4388 We encountered databases with so many tables or columns that we needed to cut some noise
4389 We didn’t necessarily need to get certain data out of the database
4390
4391Filtering closest to the DB was certainly the best choice, particularly for our customers, as they could then add or verify filters by themselves. To do this, we defined a Filter Policy to mimic IPTables policies with accept or drop defaults. The source filter then described how tables and columns would behave, depending on the policy:
4392
4393filter_policy: accept/drop
4394filter:
4395 table_a:
4396 table_b:
4397 ignored_columns:
4398 — sensibleColumn
4399
4400The drop policy will drop anything by default, except tables explicitly listed in the filter, while the accept policy will drop tables listed as empty in the filter, with the exception of tables that have an ignored_columns key, to filter only columns with their names.
4401Validations in heterogenous systems
4402
4403For certain needs, you may want to confirm that you’re processing an analytics job on the same data that the true database consists of. For example, processing revenue calculation over a given period requires the true data from date to date. Validating a replication state between a database and the data lake was challenging. In fact, integrity checks aren’t implemented with the same logic in databases or stores, so we needed a way to abstract them from the native implementation. We thought about using a Merkle Tree data structure, so that we could maintain a tree of integrity with blocks or rows. If a key/value differed from the database, then the global or intermediate integrity hash would reflect it, and we would only have to scan the leaf block that had an inconsistent hash between both systems.
4404Merkel treeMerkel tree
4405Let’s put things together
4406
4407As we stated in our introduction, CDC is set up to convert database changes into processable events. The goal here is to fulfil any business needs that require data in an efficient and effective way. Here are two examples of what we did with the data that we now had available as events…
4408Real-time joins between databases
4409
4410While we were building the data lake from replicated tables, and since this project was mainly for BI purposes, we considered adding some real-time insights, based on the same logic we we’re using with batch and Apache Pig jobs. Since 2015, the most advanced stream processing framework is Apache Flink, which we used to process real-time joins between different databases and tables.
4411
4412Alexis did an amazing job describing the join process that we injected into Apache Flink, so that in addition to replicating databases, we were also creating a new, aggregated table. Indeed, we could write an entire blog post just on this topic…
4413Real-time joins using Apache FlinkReal-time joins using Apache Flink
4414
4415We chose Apache Flink for multiple reasons:
4416
4417 Its documentation was delightful
4418 Its core engine was brilliant, and very far beyond Apache Spark (the Tungsten project wasn’t even there)
4419 It was a European project, so we were close to the editor and its community
4420
4421Real-time indexing
4422
4423Now we had a real-time table fed into Apache HBase, we needed to add a query capability on top of it. While HBase was fine from a storage point of view, it didn’t provide any search capability, and its access pattern wasn’t ideal for scanning over the search criterion.
4424
4425This is where Guillaume worked some magic! By reusing Lily, an HBase Indexer that provided the concept of SEP (Side Effect Processor), he succeeded in reinjecting the aggregated table schema into Lily to build the data type mapping needed to read HBase Byte Arrays values, before indexing them into Solr. We now had a real-time dashboard of an aggregated real-time joined table, processed from real-time change data capture. Boom!
4426Real-time indexingReal-time indexing
4427
4428That was when we started getting real customers for our new solution.
4429Going live!
4430
4431If there is still a need to demonstrate that testing in a simulation environment is not the same as testing in a production environment, this next part will likely settle the debate…
4432
4433After setting up the data pipeline, we discovered a few bugs in the production environments. Here are two of them:
4434Interleaved events
4435
4436As defined by MySQL, an event structure is composed of both a header and a data field.
4437
4438In Row-based Replication (RBR), as opposed to Statement-based Replication (SBR), each row’s event is replicated with its corresponding data. DML statements are binlogged into two parts:
4439
4440 A TABLE_MAP_EVENT
4441 A ROWS_EVENT (can be WRITE, UPDATE or DELETE)
4442
4443The first event, TABLE_MAP_EVENT, describes the metadata of the second event’s content. This metadata contains:
4444
4445 Included fields
4446 Null values bitmaps
4447 The schema of the upcoming data
4448 The metadata for the provided schema
4449
4450The second event, WRITE_ROWS_EVENT (for inserts) contains the data. To decode it, you need the previous TABLE_MAP_EVENT event to known how to consume this event, matching the corresponding MYSQL_TYPE_* , and reading the number of bytes expected for each types.
4451
4452Occasionally, some events were not consumed properly, as a dis-alignment between metadata and data led to VARCHAR values being decoded as DATETIME values, etc.
4453
4454After some debugging, it turned out that triggers had been added on some MySQL tables by the DBA team. When the replicas had been rebuilt some days later, they had inherited these features, and started to log the events produced by these triggers.
4455
4456The thing is, triggers are internal with MySQL. In binlog, every event coming from the master is sent like this:
4457
4458TableMapEvent_a
4459WriteEvent_a
4460TableMapEvent_b
4461WriteEvent_b
4462TableMapEvent_c
4463WriteEvent_c
4464
4465a, b and c represents events for different schema.tables.
4466
4467Since triggers don’t come from the master, when the slave receives a TableMapEvent for a specific table, it triggers another TableMapEvent for a specialised table (<table>_event). The same applies to the WriteEvent.
4468
4469When MySQL triggers an event, it sends it in the binlog, so you will end with a multiplexing of two TableMapEvents, then two RowsEvents, as shown below:
4470
4471TableMapEvent_a
4472TableMapEvent_a_event
4473WriteEvent_a
4474WriteEvent_a_event
4475
4476Got it! When we tried to decode WriteEvent_a, the previous TableMapEvent was for TableMapEvent_a_event, not for TableMapEvent_a, so it’d try to decode the event with the wrong schema.
4477
4478We had to find a way to match the WriteEvent to the corresponding TableMapEvent. Ideally, there would have been a TableID in the structure that we could have used for this. In the end though, we just had to buffer all TableMapEvents, making them available to all RowsEvents, start reading the RowsEvent, pick the TableID, and then get the Columns metadata from the matching TableMapEvent. Fixed!
4479The elusive decimal…
4480
4481We also encountered an arbitrary bug in the library, which caused Menura to explode. Again, we dug into the binlog library to debug the decoding process, step by step. We identified table/column tuples to limit the logging output to a more reasonable rate. A RowEvent looked like this:
4482
4483DEBUG MR: TableMapEvent.read() : event.TableName = myTable
4484DEBUG MR: TableMapEvent.read() : columnCount= 16
4485DEBUG MR: TableMapEvent.read() : Read Next columnTypeDef= [3 3 15 246 0 0 12 0 15 12 12 15 3 3 15 15]
4486DEBUG MR: readStringLength() : r.Next(int(12))
4487DEBUG MR: TableMapEvent.read() : ReadStringLength columnMetaDef= [255 0 10 2 40 0 10 0 255 0 255 3]
4488DEBUG MR: TableMapEvent.read() : columnNullBitMap= [10 198]
4489DEBUG MR: TableMapEvent.read() : switch columnTypeDef[3]=f6
4490DEBUG MR: TableMapEvent.read() : switch : case metaOffset+=2
4491DEBUG MR: TableMapEvent.read() : column.MetaInfo
4492DEBUG MR: TableMapEvent.read() : column.MetaInfo = [10 2]
4493
4494In this log, there are parts of the decoding process that are quite interesting and worth taking a closer look at. A first column presents the following schema:
4495
4496TableMapEvent.read() : Read Next columnTypeDef= [3 3 15 246 0 0 12 0 15 12 12 15 3 3 15 15]
4497
4498Some of these data types require metadata to be read. For example, here is the corresponding log with column metadata:
4499
4500TableMapEvent.read() : ReadStringLength columnMetaDef= [255 0 10 2 40 0 10 0 255 0 255 3]
4501
4502Also, the NullBitMap column is important, since we have to know which null values should be ignored while decoding the buffer.
4503
4504This crash didn’t happen on a daily basis, and the stacktrace didn’t point me to a fixed part of the code. It seemed like a shift in the decoding that would cause arbitrary crashes when casting data types wasn’t possible. To debug at a deeper level, we needed to log more. And so we logged the buffer’s current offset, the size read for each data type, the metadata for each data type, and the value. Here is an example for a MYSQL_TYPE_NEWDECIMAL :
4505
4506DEBUG MR: rowsEvent.read(): pack.Len() BEFORE : 59
4507DEBUG MR: rowsEvent.read(): Column.MetaInfo: &{246 [10 2] true}
4508DEBUG MR: rowsEvent.read(): switch column.Type= 246
4509DEBUG MR: case MYSQL_TYPE_NEWDECIMAL
4510DEBUG MR: readNewDecimal() precision= 10 scale= 2
4511DEBUG MR: readNewDecimal() size= 5
4512DEBUG MR: readNewDecimal() buff=8000000000
4513DEBUG MR: readNewDecimal() decimalpack=0000000000
4514DEBUG MR: rowsEvent.read(): pack.Len() AFTER : 54
4515DEBUG MR: rowsEvent.read(): value : 0
4516DEBUG MR: rowsEvent.read(): pack.Len() BEFORE : 54
4517DEBUG MR: rowsEvent.read(): Column.MetaInfo: &{0 [] false}
4518DEBUG MR: rowsEvent.read(): switch column.Type= 0
4519
4520Regarding the previous schema, we have 16 columns, and according to MySQL documentation our data types provide metadata as in the following table:
4521type code
4522
4523MYSQL_TYPE
4524
4525Metadata
45263 MYSQL_TYPE_LONG 0
452715 MYSQL_TYPE_VARCHAR 2
45280 MYSQL_TYPE_DECIMAL 2
452912 MYSQL_TYPE_DATETIME 0
45300 MYSQL_TYPE_NEWDECIMAL 2
4531
4532This gives us 18 bytes of metadata for this example schema, as opposed to the 10 bytes in the packet.
4533
4534We also found that MySQL apparently didn’t send the metadata needed to read DECIMAL values in the packet. Was this a normal behaviour?
4535
4536The MySQL documentation is clear: to read a DECIMAL value, you need the metadata (precision, scale etc.). Period.
4537
4538However, we found that MYSQL_TYPE_DECIMAL was treated like MYSQL_TYPE_NEWDECIMAL.
4539
4540case MYSQL_TYPE_DECIMAL, MYSQL_TYPE_NEWDECIMAL:
4541value.value = pack.readNewDecimal(int(column.MetaInfo[0]), int(column.MetaInfo[1]))
4542
4543We stepped back and searched for how this MYSQL_TYPE_DECIMAL was implemented in other binlog libraries. I was not DBA, but it felt strange that schema using DECIMAL values were actually using two different MySQL data types.
4544
4545Okay… “Houston, we have a problem.”
4546
4547First, nobody was implementing MYSQL_TYPE_DECIMAL, and for a very good reason: we shouldn’t be receiving it, since it had been deprecated from MySQL since version 5.0. This meant the database behind was running a table created from (at best) MySQL 4.9, while the database had been upgraded without having received a proper ALTER to automatically convert data types to MYSQL_TYPE_NEWDECIMAL.
4548
4549Second, since we don’t have any control over the database, how do we decode a MYSQL_TYPE_DECIMAL…
4550First attempt: Ignore it
4551
4552We first circumvented this issue by actually not reading two bytes of metadata when we parsed a MYSQL_TYPE_DECIMAL column. This stopped corrupting the metadata offset, and other data types were now aligned with their metadata.
4553
4554We missed the decimal value, but we could continue to read other data types. Well, sort of… It was better, but to read values after a MYSQL_TYPE_DECIMAL in the data buffer, we needed to know how many bytes to read.
4555Second attempt: The naïve approach (i.e. guessing!)
4556
4557A decimal value is a fractional number, usually encoded as a floating-point number. For example, a DECIMAL(10,2) column has eight integer digits and two fractional digits. The integer digits determine the number of bytes needed to be read. For example, we read four bytes for the integer part, and one byte for the fractional part. This would have been so simple… if we had the metadata.
4558
4559In practice, MySQL didn’t provide any metadata for DECIMAL values, hence why we ignored it in the first iteration, to preserve other data. Have you ever tried to decode old binlogs with the official mysqlbinlog tool? If you had a MYSQL_TYPE_DECIMAL in your data, then it would stop decoding there. Yes… MySQL doesn’t know how to decode its own data format!
4560
4561One could argue that if MySQL doesn’t provide any metadata, it’s because it stores it internally, at a fixed size. Well… no!
4562sql value
4563
4564byte array
4565
4566type
45670.00 32 32 32 32 32 32 32 48 46 48 48 decimal(10,2)
45680.000 32 32 48 46 48 48 48 decimal(5,3)
4569
4570Here’s how it actually works… Decimals are encoded as VARCHAR in the protocol. I tried to read the value, assuming the space padding, flagged the dot encountered, and tried to read fractional data that seemed coherent for a decimal. If it wasn’t, I eventually unread the last byte in the buffer and continued to the next data type. And it worked. For a time…
4571
4572DEBUG MR: case MYSQL_TYPE_DECIMAL
4573DEBUG MR: readOldDecimalV2: byte = 32
4574DEBUG MR: readOldDecimalV2: continue
4575DEBUG MR: readOldDecimalV2: byte = 32
4576DEBUG MR: readOldDecimalV2: continue
4577DEBUG MR: readOldDecimalV2: byte = 32
4578DEBUG MR: readOldDecimalV2: continue
4579DEBUG MR: readOldDecimalV2: byte = 32
4580DEBUG MR: readOldDecimalV2: continue
4581DEBUG MR: readOldDecimalV2: byte = 32
4582DEBUG MR: readOldDecimalV2: continue
4583DEBUG MR: readOldDecimalV2: byte = 32
4584DEBUG MR: readOldDecimalV2: continue
4585DEBUG MR: readOldDecimalV2: byte = 32
4586DEBUG MR: readOldDecimalV2: continue
4587DEBUG MR: readOldDecimalV2: byte = 48
4588DEBUG MR: readOldDecimalV2: start writing
4589DEBUG MR: readOldDecimalV2: byte = 46
4590DEBUG MR: readOldDecimalV2: dot found
4591DEBUG MR: readOldDecimalV2: writing
4592DEBUG MR: readOldDecimalV2: byte = 48
4593DEBUG MR: readOldDecimalV2: writing
4594DEBUG MR: readOldDecimalV2: byte = 48
4595DEBUG MR: readOldDecimalV2: writing
4596DEBUG MR: readOldDecimalV2: byte = 32
4597DEBUG MR: readOldDecimalV2: unread, break
4598DEBUG MR: readOldDecimalV2: converting to float : 0.00
4599DEBUG MR: rowsEvent.read(): pack.Len() AFTER : 43
4600DEBUG MR: rowsEvent.read(): value : 0
4601
4602We hope we don’t encounter a following VARCHAR type with a length that could be parsed as a DIGIT value, but the dynamic size of the DECIMAL value means that there should be metadata available to properly read this. There is no other way.
4603Third attempt: There is no compromise when it comes to being a good slave!
4604
4605We asked ourselves what makes mysqlbinlog different to a MySQL Slave when it comes to reading binlogs. We found that the only difference was that a true slave knew the DECIMAL schema and associated metadata when receiving these data. So it wouldn’t have to guess anything – just read the right number of bytes, according to the known schema.
4606
4607We ended up implementing a MySQL client into our mysqlbinlog source, which initially dumped the schemas of tables to pass in the NumericScale value into the decoding library. The pitfall here is that rows aren’t identified in schemas by their ColumnName. MySQL maintains an OrdinalPosition for the columns in a table, but it’s not the ID that is provided in the binlog protocol (that would be too easy!). You have to maintain your own column index from the schema, to make it match the one you will receive in the binlog protocol. Once you have it, just look up the decimal scale value to know how many bytes you still have to read after the dot.
4608
4609This way, the decoding library was now capable of decoding MYSQL_TYPE_DECIMAL from the binlog stream of events. Hooray!!
4610TL;DR
4611
4612In the end, building a BI stack from scratch took approximately six months. The team was composed of 2.5 people: Alexis Gendronneau, Guillaume Salou (who joined us after three months) and me. It demonstrated the principle of Change Data Capture applied to real-life use cases, enabling real-time insights into sales, stocks, and more, without any impact on production environments. The team grew as the project extended its scope, with new, more challenging customers, like financial services and management control teams. Weeks later, we succeeded in launching it on Apache Flink, based on the same data pipeline that has since become the trusted source for revenue calculation and other business KPIs.
4613
4614We learned a lot from this project. A key lesson is how important keeping your technical debt in control can be, and what impact it can have on other teams and projects. Also, working with Apache Flink for a range of projects proved to be a wonderful experience for our teams.
4615
4616The whole team delivered great work, and Dimitri Capitaine is about to open-source the data collector agent that powered the preview labs: OVH Data Collector. If you’re interested in discussing Change Data Capture at OVH in greater depth, feel free to join us on the team’s Gitter, or DM me on Twitter.
4617<|endoftext|>
4618<|startoftext|>
4619
4620Getting external traffic into Kubernetes – ClusterIp, NodePort, LoadBalancer, and Ingress
4621By Horacio Gonzalez / 2019-02-22 / Kubernetes, OVH Managed Kubernetes, OVH Platform
4622
4623For the last few months, I have been acting as Developer Advocate for the OVH Managed Kubernetes beta, following our beta testers, getting feedback, writing docs and tutorials, and generally helping to make sure the product matches our users’ needs as closely as possible.
4624
4625In the next few posts, I am going to tell you some stories about this beta phase. We’ll be taking a look at feedback from some of our beta testers, technical insights, and some fun anecdotes about the development of this new service.
4626
4627Today, we’ll start with one of the most frequent questions I got during the early days of the beta: How do I route external traffic into my Kubernetes service? The question came up a lot as our customers began to explore Kubernetes, and when I tried to answer it, I realised that part of the problem was the sheer number of possible answers, and the concepts needed to understand them.
4628
4629Related to that question was a feature request: most users wanted a load balancing tool. As the beta phase is all about confirming the stability of the product and validating the feature set prioritisation, we were able to quickly confirm LoadBalanceras a key feature of our first commercial release.
4630
4631To try to better answer the external traffic question, and to make the adoption of LoadBalancereasier, we wrote a tutorial and added some drawings, which got nice feedback. This helped people to understand the concept underlaying the routing of external traffic on Kubernetes.
4632
4633This blog post is an expanded version of this tutorial. We hope that you will find it useful!
4634Some concepts: ClusterIP, NodePort, Ingress and LoadBalancer
4635
4636When you begin to use Kubernetes for real-world applications, one of the first questions to ask is how to get external traffic into your cluster. The official documentation offers a comprehensive (but rather dry) explanation of this topic, but here we are going to explain it in a more practical, need-to-know way.
4637
4638There are several ways to route external traffic into your cluster:
4639
4640 Using Kubernetes proxy and ClusterIP: The default Kubernetes ServiceType is ClusterIp, which exposes the Service on a cluster-internal IP. To reach the ClusterIp from an external source, you can open a Kubernetes proxy between the external source and the cluster. This is usually only used for development.
4641
4642 Exposing services as NodePort: Declaring a Service as NodePortexposes it on each Node’s IP at a static port (referred to as the NodePort). You can then access the Service from outside the cluster by requesting <NodeIp>:<NodePort>. This can also be used for production, albeit with some limitations.
4643
4644 Exposing services as LoadBalancer: Declaring a Service as LoadBalancer exposes it externally, using a cloud provider’s load balancer solution. The cloud provider will provision a load balancer for the Service, and map it to its automatically assigned NodePort. This is the most widely used method in production environments.
4645
4646Using Kubernetes proxy and ClusterIP
4647
4648The default Kubernetes ServiceType is ClusterIp, which exposes the Service on a cluster-internal IP. To reach the ClusterIp from an external computer, you can open a Kubernetes proxy between the external computer and the cluster.
4649
4650You can use kubectl to create such a proxy. When the proxy is up, you’re directly connected to the cluster, and you can use the internal IP (ClusterIp) for thatService.
4651kubectl proxy and ClusterIP
4652kubectl proxy and ClusterIP
4653
4654This method isn’t suitable for a production environment, but it’s useful for development, debugging, and other quick-and-dirty operations.
4655Exposing services as NodePort
4656
4657Declaring a service as NodePort exposes the Service on each Node’s IP at the NodePort (a fixed port for that Service, in the default range of 30000-32767). You can then access the Service from outside the cluster by requesting <NodeIp>:<NodePort>. Every service you deploy as NodePort will be exposed in its own port, on every Node.
4658NodePort
4659NodePort
4660
4661It’s rather cumbersome to use NodePortfor Servicesthat are in production. As you are using non-standard ports, you often need to set-up an external load balancer that listens to the standard ports and redirects the traffic to the <NodeIp>:<NodePort>.
4662Exposing services as LoadBalancer
4663
4664Declaring a service of type LoadBalancer exposes it externally using a cloud provider’s load balancer. The cloud provider will provision a load balancer for the Service, and map it to its automatically assigned NodePort. How the traffic from that external load balancer is routed to the Service pods depends on the cluster provider.
4665LoadBalancer
4666LoadBalancer
4667
4668The LoadBalancer is the best option for a production environment, with two caveats:
4669
4670 Every Service that you deploy as LoadBalancer will get it’s own IP.
4671 The LoadBalancer is usually billed based on the number of exposed services, which can be expensive.
4672
4673 We are currently offering the OVH Managed Kubernetes LoadBalancer service as a free preview, until the end of summer 2019.
4674
4675What about Ingress?
4676
4677According to the official documentation, an Ingress is an API object that manages external access to the services in a cluster (typically HTTP). So what’s the difference between this and LoadBalancer or NodePort?
4678
4679Ingress isn’t a type of Service, but rather an object that acts as a reverse proxy and single entry-point to your cluster that routes the request to different services. The most basic Ingress is the NGINX Ingress Controller, where the NGINX takes on the role of reverse proxy, while also functioning as SSL.
4680Ingress
4681Ingress
4682
4683Ingress is exposed to the outside of the cluster via ClusterIP and Kubernetes proxy, NodePort, or LoadBalancer, and routes incoming traffic according to the configured rules.
4684Ingress behind LoadBalancer
4685Ingress behind LoadBalancer
4686
4687The main advantage of using an Ingress behind a LoadBalancer is the cost: you can have lots of services behind a single LoadBalancer.
4688Which one should I use?
4689
4690Well, that’s the one million dollar question, and one which will probably elicit a different response depending on who you ask!
4691
4692You could go 100% LoadBalancer, getting an individual LoadBalancer for each service. Conceptually, it’s simple: every service is independent, with no extra configuration needed. The downside is the price (you will be paying for one LoadBalancer per service), and also the difficulty of managing lots of different IPs.
4693
4694You could also use only one LoadBalancer and an Ingress behind it. All your services would be under the same IP, each one in a different path. It’s a cheaper approach, as you only pay for one LoadBalancer, but if your services don’t have a logical relationship, it can quickly become chaotic.
4695
4696If you want my personal opinion, I would try to use a combination of the two…
4697
4698An approach I like is having a LoadBalancer for every related set of services, and then routing to those services using an Ingressbehind the LoadBalancer. For example, let’s say you have two different microservice-based APIs, each one with around 10 services. I would put one LoadBalancer in front of one Ingress for each API, the LoadBalancerbeing the single public entry-point, and theIngress routing traffic to the API’s different services.
4699
4700But if your architecture is quite complex (especially if you’re using microservices), you will soon find that manually managing everything with LoadBalancer and Ingress is rather cumbersome. If that’s the case, the answer could be to delegate those tasks to a service mesh…
4701What’s a service mesh?
4702
4703You may have heard of Istio or Linkerd, and how they make it easier to build microservice architectures on Kubernetes, adding nifty perks like A/B testing, canary releases, rate limiting, access control, and end-to-end authentication.
4704
4705Istio, Linkerd, and similar tools are service meshes, which allow you to build networks of microservices and define their interactions, while simultaneously adding some high-value features that make the setup and operation of microservice-based architectures easier.
4706
4707There’s a lot to talk about when it comes to using service meshes on Kubernetes, but as they say, that’s a story for another time…
4708<|endoftext|>
4709<|startoftext|>
4710
4711Deep Learning explained to my 8-year-old daughter
4712By Jean-Louis Queguiner / 2019-02-15 / Artificial Intelligence, Deep learning, GPU, Machine learning, Neural networks
4713
4714Machine Learning and especially Deep Learning are hot topics and you are sure to have come across the buzzword “Artificial Intelligence” in the media.
4715
4716Deep Learning: A new hype
4717
4718Yet these are not new concepts. The first Artificial Neural Network (ANN) was introduced in the 40s. So why all the recent interest around neural networks and Deep Learning?
4719
4720We will explore this and other concepts in a series of blog posts on GPUs and Machine Learning.
4721YABAIR – Yet Another Blog About Image Recognition
4722
4723In the 80s, I remember my father building character recognition for bank checks. He used primitives and derivatives around pixel darkness level. Examining so many different types of handwriting was a real pain because he needed one equation to apply to all the variations.
4724
4725In the last few years, It has become clear that the best way to deal with this type of problem is through Convolutional Neural Networks. Equations designed by humans are no longer fit to handle infinite handwriting patterns.
4726
4727Let’s take a look at one of the most classic examples: building a number recognition system, a neural network to recognise handwritten digits.
4728Fact 1: It’s as simple as counting
4729
4730We’ll start by counting how many times the small red shapes in the top row can be seen in each of the black, hand-written digits, (in the left-hand column).
4731Simplified matrix for handwritten numbers
4732Simplified matrix for handwritten numbers
4733
4734Now let’s try to recognise (infer) a new hand-written digit, by counting the number of matches with the same red shapes. We’ll then compare this to our previous table, in order to identify which number has the most correspondences:
4735Matching shapes for handwritten numbers
4736Matching shapes for handwritten numbers
4737
4738
4739
4740Congratulations! You’ve just built the world’s simplest neural network system for recognising hand-written digits.
4741Fact 2: An image is just a matrix
4742
4743A computer views an image as a matrix. A black and white image is a 2D matrix.
4744
4745Let’s consider an image. To keep it simple, let’s take a small black and white image of an 8, with square dimensions of 28 pixels.
4746
4747Every cell of the matrix represents the intensity of the pixel from 0 (which represents black), to 255 (which represents a pure white pixel).
4748
4749The image will therefore be represented as the following 28 x 28 pixel matrix.
4750Image of a handwritten 8 and the associated intensity matrix
4751Image of a handwritten 8 and the associated intensity matrix
4752
4753
4754Fact 3: Convolutional layers are just bat-signals
4755
4756To work out which pattern is displayed in a picture (in this case the handwritten 8) we will use a kind of bat-signal/flashlight. In machine learning, the flashlight is called a filter. The filter is used to perform a classic convolution matrix calculation used in usual image processing software such as Gimp.
4757
4758Résultat de recherche d'images pour "batman torch light sky gif"
4759
4760
4761
4762The filter will scan the picture in order to find the pattern in the image and will trigger a positive feedback if a match is found. It works a bit like a toddler shape sorting box: triangle filter matching triangle hole, square filter matching square hole and so on.
4763Image filters work like children shape sorting boxes
4764Image filters work like children shape sorting boxes
4765
4766
4767Fact 4: Filter matching is an embarrassingly parallel task
4768
4769To be more scientific the image filtering process looks a bit like the animation below. As you can see, every step of the filter scanning is independent, which means that this task can be highly parallelised.
4770
4771It’s important to note that tens of filters will be applied at the same time, in parallel as none of them are dependent.
4772
4773Convolution Filter over an input image
4774https://github.com/vdumoulin
4775
4776Fact 5: Just repeat the filtering operation (matrix convolution) as many times as possible
4777
4778We just saw that the input image/matrix is filtered using multiple matrix convolutions.
4779
4780To improve the accuracy of the image recognition just take the filtered image from the previous operation and filter again and again and again…
4781
4782Of course, we are oversimplifying things somewhat, but generally the more filters you apply, and the more you repeat this operation in sequence, the more precise your results will be.
4783
4784It’s like creating new abstraction layers to get a clearer and clearer object filter description, starting from primitive filters to filters that look like edges, wheel, squares, cubes, …
4785
4786
4787
4788Fact 6: Matrix convolutions are just x and +
4789
4790An image is worth a thousand words: the following picture is a simplistic view of a source image (8×8) filtered with a convolution filter (3×3). The projection of the torch light (in this example a Sobel Gx Filter) provides one value.
4791
4792
4793Example of a convolution filter (Sobel Gx) applied to an input matrix (Source : https://datascience.stackexchange.com/questions/23183/why-convolutions-always-use-odd-numbers-as-filter-size/23186)
4794
4795This is where the magic happens, simple matrix operations are highly parallelised which fits perfectly with a General Purpose Graphical Processing Unit use case.
4796Fact 7: Need to simplify and summarise what’s been detected? Just use max()
4797
4798We need to summarise what’s been detected by the filters in order to generalise the knowledge.
4799
4800To do so, we will sample the output of the previous filtering operation.
4801
4802This operation is call pooling or downsampling but in fact it’s about reducing the size of the matrix.
4803
4804You can use any reducing operation such as: max, min, average, count, median, sum and so on.
4805Example of a max pooling layer (Source : Stanford’s CS231n)
4806Fact 8: Flatten everything to get on your feet
4807
4808Let’s not forget the main purpose of the neural network we are working on: building an image recognition system, also called image classification.
4809
4810If the purpose of the neural network is to detect hand-written digits there will be 10 classes at the end to map the input image to : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4811
4812To map this input to a class after passing through all those filters and downsampling layers, we will have just 10 neurons (each of them representing a class) and each will connect to the last sub sampled layer.
4813
4814Below is an overview of the original LeNet-5 Convolutional Neural Network designed by Yann Lecun one of the few early adopter of this technology for image recognition.
4815Résultat de recherche d'images pour "lenet5"
4816LeNet-5 architecture published in the original paper (source : http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf).
4817Fact 9: Deep Learning is just LEAN – continuous improvement based on a feedback loop
4818
4819The beauty of the technology does not only come from the convolution but from the capacity of the network to learn and adapt by itself. By implementing a feedback loop called backpropagation the network will mitigate and inhibit some “neurons” in the different layers using weights.
4820
4821Let’s KISS (keep it simple): we look at the output of the network, if the guess (the output 0,1,2,3,4,5,6,7,8 or 9) is wrong, we look at which filter(s) “made a mistake”, we give this filter or filters a small weight so they will not make the same mistake next time. And voila! The system learns and keeps improving itself.
4822Fact 10: It all amounts to the fact that Deep Learning is embarrassingly parallel
4823
4824Ingesting thousands of images, running tens of filters, applying downsampling, flattening the output … all of these steps can be done in parallel which make the system embarrassingly parallel. Embarrassingly means in reality a perfectly parallel problem and it’s just a perfect use case for GPGPU (General Purpose Graphic Processing Unit), which are perfect for massively parallel computing.
4825Fact 11: Need more precision? Just go deeper
4826
4827Of course it is a bit of an oversimplification, but if we look at the main “image recognition competition”, known as the ImageNet challenge, we can see that the error rate has decreased with the depth of the neural network. It is generally acknowledged that, among other elements, the depth of the network will lead to a better capacity for generalisation and precision.
4828Imagenet competition winner error rates VS number of layers in the network (source : https://medium.com/@sidereal/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5)
4829In conclusion
4830
4831We have taken a brief look at the concept of Deep Learning as applied to image recognition. It’s worth noting that almost every new architecture for image recognition (medical, satellite, autonomous driving, …) uses these same principles with a different number of layers, different types of filters, different initialisation points, different matrix sizes, different tricks (like image augmentation, dropouts, weight compression, …). The concepts remain the same:
4832Number detection process
4833Number detection process
4834
4835
4836
4837
4838In other words, we saw that the training and inference of deep learning models comes down to lots and lots of basic matrix operations that can be done in parallel, and this is exactly what our good old graphical processors (GPU) are made for.
4839In the next post we will discuss how precisely a GPU works and how technically deep learning is implemented into it.
4840<|endoftext|>
4841<|startoftext|>
4842
4843How does OVH manage the CI/CD at scale?
4844By Yvonnick Esnault / 2019-02-14 / Automation, CDS, Devops, Industrialization
4845
4846The delivery process is the set of steps – from git commit to production – that takes place to deliver your service to your customers. Drawing on agile values, Continuous Integration and Continuous Delivery (CI/CD ) are practices that aim to automate this process as much as possible.
4847
4848From git to production
4849
4850The Continuous Delivery Team at OVH has one fundamental mission: to help the OVH developers industrialise and automate their delivery processes. The CD team is here to advocate CI/CD best practices and maintain our ecosystem tools, with the maximum focus on as-a-service solutions.
4851
4852CDS
4853
4854The centre of this ecosystem is a tool called CDS, developed in-house at OVH.
4855CDS is an open-source software solution that can be found at https://github.com/ovh/cds, with documentation at https://ovh.github.io/cds.
4856
4857CDS is the third generation of CI/CD tools at OVH, following two previous solutions, that were based on Bash, Jenkins, Gitlab and Bamboo. It is the end-result of 12 years’ experience in the field of CI/CD. Familiar with most of the standard tools of the industry, we found that none completely matched our expectations regarding the four key aspects we identified. That is what CDS tries to solve.
4858
4859These four aspects are:
4860Elastic
4861
4862CDS resources/workers are launched on demand, to guarantee low waiting times for users, with no over-consumption of idle resources.
4863Extensible
4864
4865In CDS, any kind of action (Kubernetes and OpenStack deployments, pushing to Kafka, testing for CVEs…) can be captured in high-level plugins, to be used as building blocks by users. These plugins are straightforward to write and use, so it’s easy to meet the most exotic needs in an effective and stress-free way.
4866Flexible, but easy
4867
4868CDS can run complex workflows, with all sorts of intermediary steps, including build, test, deploy 1/10/100, manual or automatic gates, rollback, conditional branches… These workflows can be stored as code in the git repository. CDS provides basic workflow templates for the Core team’s most common scenarios, in order to ease the adoption process. This way, building a functional CI/CD chain from nothing can be quick and easy.
4869Self-service
4870
4871Finally, a key aspect is the idea of self-service. Once a CDS project is created by users, they are completely autonomous within that space, with the freedom to manage pipelines, delegate access rights etc. All users are free to customise their space as they see fit, and build on what is provided out-of-the-box. Personalising workflow templates, plugins, running build and tests on custom VM flavors or custom hardware… all this can be done without any intervention from the CDS administrators.
4872CI/CD in 2018 – Five million workers!
4873
4874 About 5.7M workers started and deleted on demand.
4875 3.7M containers
4876 2M Virtual Machines
4877
4878How is it possible?
4879
4880One of the initial CDS objectives at OVH was to build and deploy 150 applications as a container in less than seven minutes. This has been a reality since 2015. So what’s the secret? Auto-Scale on Demand!
4881
4882With this approach, you can have hundreds of worker models that CDS will launch via hatcheries whenever necessary.
4883CDS Hatchery
4884CDS Hatchery
4885
4886
4887
4888A hatchery is like an incubator: it gives birth to the CDS workers and maintains the power of life and death over them.
4889CDS Hatcheries - Worker @Scale
4890CDS Hatcheries – Worker @Scale
4891
4892
4893
4894Each hatchery is dedicated to an orchestrator. Furthermore, one CDS Instance can create workers over many cloud platforms:
4895– The Kubernetes hatchery starts workers in pods
4896– The OpenStack hatchery starts virtual machines
4897– The Swarm hatchery starts docker containers
4898– The Marathon hatchery starts docker containers
4899– The VSphere hatchery start virtual machines
4900– The local hatchery starts process on a host
4901CDS Hatcheries
4902CDS Hatcheries
4903What’s next?
4904
4905This is all just a preview of CDS… we have lots more to tell you about! The CI/CD tool offers a wide range of features that we will explore in depth in our upcoming articles. We promise, before 2019 is done, you will not look at your CI/CD tool the same way again…
4906<|endoftext|>
4907<|startoftext|>
4908
4909Understanding CI/CD for Big Data and Machine Learning
4910By Yvonnick Esnault / 2019-02-14 / Automation, Big Data, CDS, DataBuzzWord, Devops, Docker, Kubernetes, Machine learning, OpenStack, Podcast
4911
4912This week, the OVH Integration and Continuous Deployment team was invited to the DataBuzzWord podcast.
4913
4914
4915
4916CDS
4917
4918Together, we explored the topic of continuous deployment in the context of machine learning and big data. We also discussed continuous deployment for environments like Kubernetes, Docker, OpenStack and VMware VSphere.
4919
4920If you missed it, or would like to review everything that was discussed, you can listen to it again here. We hope to return soon, to continue sharing our passion for testing, integration and continuous deployment.
4921
4922Although the podcast was recorded in French, starting from tomorrow, we’ll be delving further into the key points of our discussion in a series of articles on this blog.
4923
4924Find CDS on GitHub:
4925
4926 https://github.com/ovh/cds
4927
4928…. and follow us on Twitter:
4929
4930 https://twitter.com/yesnault
4931 https://twitter.com/francoissamin
4932
4933Come chat about these subjects with us on our Gitter channel: https://gitter.im/ovh-cds/
4934<|endoftext|>
4935<|startoftext|>
4936
4937TSL: a developer-friendly Time Series query language for all our metrics
4938By Aurélien Hébert / 2019-02-13 / Language, Metrics, Time series, TSL
4939
4940At the Metrics team, we have been working on Time Series for several years now. In our experience, the data analytics capabilities of a Time Series Database (TSDB) platform is a key factor in creating value from your metrics. These analytics capabilities are mostly defined by the query languages they support.
4941
4942TSL stands for Time Series Language. In simple terms, TSL is an abstracted way of generating queries for different TSDB backends, in the form of an HTTP proxy. It currently supports Warp 10’s WarpScript and Prometheus’ PromQL query languages, but we aim to extend the support to other major TSDBs.
4943TSL - Time Series Language
4944
4945To provide some context around why we created TSL, it began with a review of some of the TSDB query languages supported on the OVH Metrics Data Platform. When implementing them, we learned the good, the bad and the ugly of each one. In the end, we decided to build TSL to simplify the querying on our platform, before open-sourcing it to use it on any TSDB solution.
4946
4947So why did we decide to invest our time in developing such a proxy? Well, let me tell you the story of the OVH Metrics protocol!
4948From OpenTSDB…
4949OpenTSDB
4950
4951The first aim of our platform is to be able to support the OVH infrastructure and application monitoring. When this project started, a lot of people were using OpenTSDB, and were familiar with its query syntax. OpenTSDB is a scalable database for Time Series. The OpenTSDB query syntax is easy to read, as you send a JSON document describing the request. The document below will load allsys.cpu.0 metrics of thetestdatacentre, summing them between thestart and end dates:
4952
4953{
4954 "start": 1356998400,
4955 "end": 1356998460,
4956 "queries": [
4957 {
4958 "aggregator": "sum",
4959 "metric": "sys.cpu.0",
4960 "tags": {
4961 "host": "*",
4962 "dc": "test"
4963 }
4964 }
4965 ]
4966}
4967
4968This enables the quick retrieval of specific data, in a specific time range. At OVH, this was used for graphs purpose, in conjunction with Grafana, and helped us to spot potential issues in real time, as well as investigate past events. OpenTSDB integrates simple queries, where you can define your own sampling and deal with counter data, as well as filtered and aggregated raw data.
4969
4970OpenTSDB was the first protocol supported by the Metrics team, and is still widely used today. Internal statistics shows that 30-40% of our traffic is based on OpenTSDB queries. A lot of internal use cases can still be entirely resolved with this protocol, and the queries are easy to write and understand.
4971
4972For example, a query with OpenTSDB to get the max value of theusage_systemfor thecpu0 to 9, sampled for a 2-minute span by their values’ average, looks like this:
4973
4974{
4975 "start": 1535797890,
4976 "end": 1535818770,
4977 "queries": [{
4978 "metric":"cpu.usage_system",
4979 "aggregator":"max",
4980 "downsample":"2m-avg",
4981 "filters": [{
4982 "type":"regexp",
4983 "tagk":"cpu",
4984 "filter":"cpu[0–9]+",
4985 "groupBy":false
4986 }]
4987 }]
4988}
4989
4990However, OpenTSDB quickly shows its limitations, and some specific uses cases can’t be resolved with it. For example, you can’t apply any operations directly on the back-end. You have to load the data on an external tool and use it to apply any analytics.
4991
4992One of the main areas where OpenTSDB (version 2.3) is lacking is multiple Time Series set operators, which allow actions like a divide series. Those operators can be a useful way to compute the individual query time per request, when you have (for example) a set of total time spend in requests and a set of total requests count series. That’s one of the reasons why the OVH Metrics Data Platform supports other protocols.
4993… to PromQL
4994Prometheus
4995
4996The second protocol we worked on was PromQL, the query language of the Prometheus Time Series database. When we made that choice in 2015, Prometheus was gaining some traction, and it still has an impressive adoption rate. But if Prometheus is a success, it isn’t for it’s query language, PromQL. This language never took off internally, although it has started to gain some adoption recently, mainly due to the arrival of people that worked with Prometheus in their previous companies. Internally, PromQL queries represent about 1-2% of our daily traffic. The main reasons are that a lot of simple use cases can be solved quickly and with more control of the raw data with OpenTSDB queries, while a lot of more complex use cases cannot be solved with PromQL. A similar request to the one defined in OpenTSDB would be:
4997
4998api/v1/query_range?
4999query=max(cpu. usage_system{cpu=~"cpu[0–9]%2B"})
5000start=1535797890&
5001end=1535818770&
5002step=2m
5003
5004With PromQL, you lose control of how you sample the data, as the only operator is last. This means that if (for example) you want to downsample your series with a 5-minute duration, you are only able to keep the last value of each 5-minute series span. In contrast, all competitors include a range of operators. For example, with OpenTSDB, you can choose between several operators, including average, count, standard deviation, first, last, percentiles, minimal, maximal or summing all values inside your defined span.
5005
5006In the end, a lot of people choose to use a much more complex method: WarpScript, which is powered by the Warp10 Analytics Engine we use behind the scenes.
5007Our internal adoption of WarpScript
5008WarpScript by SenX
5009
5010WarpScript is the current Time Series language of Warp 10(R), our underlying backend. WarpScript will help for any complex Time Series use case, and solves numerous real-world problems, as you have full control of all your operations. You have dedicated frameworks of functions to sample raw data and fill missing values. You also have frameworks to apply operations on single-value or window operations. You can apply operations on multiple Time Series sets, and have dedicated functions to manipulate Time Series times, statistics, etc.
5011
5012It works with a Reverse Polish Notation (like a good, old-fashioned HP48, for those who’ve got one!), and simple uses cases can be easy to express. But when it comes to analytics, while it certainly solves problems, it’s still complex to learn. In particular, Time Series use cases are complex and require a thinking model, so WarpScript helped to solve a lot of hard ones.
5013
5014This is why it’s still the main query used at OVH on the OVH Metrics platform, with nearly 60% of internal queries making use of it. The same request that that we just computed in OpenTSDB and PromQL would be as follows in WarpScript:
5015
5016[ "token" "cpu.average" { "cpu" "~cpu[0–9]+" } NOW 2 h ] FETCH
5017[ SWAP bucketizer.mean 0 2 m 0 ] BUCKETIZE
5018[ SWAP [ "host" ] reducer.max ] REDUCE
5019
5020A lot of users find it hard to learn WarpScript at first, but after solving their initial issues with some (sometimes a lot of) support, it becomes the first step of their Time Series adventure. Later, they figure out some new ideas about how they can gain knowledge from their metrics. They then come back with many demands and questions about their daily issues, some of which can be solved quickly, with their own knowledge and experience.
5021
5022What we learned from WarpScript is that it’s a fantastic tool with which to build analytics for our Metrics data. We pushed many complex use cases with advanced signal-processing algorithms like LTTB, Outliers or Patterns detections, and Kernel Smoothing, where it proved to be a real enabler. However, it proved quite expensive to support for basic requirements, and feedback indicated the syntax and overall complexity were big concerns.
5023
5024A WarpScript can involve dozens (or even hundreds) of lines, and a successful execution is often an accomplishment, with the special feeling that comes from having made full use of one’s brainpower. In fact, an inside joke amongst our team is being born able to write a WarpScript in a single day, or to earn a WarpScript Pro Gamer badge! That’s why we’ve distributed Metrics t-shirts to users that have achieved significant successes with the Metrics Data Platform.
5025
5026We liked the WarpScript semantic, but we wanted it to have a significant impact on a broader range of use cases. This is why we started to write TSL with few simple goals:
5027
5028 Offer a clear Time Series analytics semantic
5029 Simplify the writing and making it developer-friendly
5030 Support data flow queries and ease debugging for complex queries
5031 Don’t try and be the ultimate tool box. Keep it simple.
5032
5033We know that users will probably have to switch back to WarpScript every so often. However, we hope that using TSL will simplify their learning curve. TSL is simply a new step in the Time Series adventure!
5034The path to TSL
5035TSL - Time Series Language
5036
5037TSL is the result of three years of Time Series analytics support, and offers a functional Time Series Language. The aim of TSL is to build a Time Series data flow as code.
5038
5039With TSL, native methods, such as select and where, exist to choose which metrics to work on. Then, as Time Series data is time-related, we have to use a time selector method on the selected meta. The two available methods are from and last. The vast majority of the other TSL methods take Time Series sets as input and provide Time Series sets as the result. For example, you have methods that only select values above a specific threshold, compute rate, and so on. We have also included specific operations to apply to multiple subsets of Time Series sets, as additions or multiplications.
5040
5041Finally, for a more readable language, you can define variables to store Time Series queries and reuse them in your script any time you wish. For now, we support only a few native types, such as Numbers, Strings, Time durations, Lists, and Time Series (of course!).
5042
5043Finally, the same query used throughout this article will be as follows in TSL:
5044
5045select("cpu.usage_system")
5046.where("cpu~cpu[0–9]+")
5047.last(12h)
5048.sampleBy(2m,mean)
5049.groupBy(max)
5050
5051You can also write more complex queries. For example, we condensed our WarpScript hands-on, designed to detect exoplanets from NASA raw data, into a single TSL query:
5052
5053sample = select('sap.flux')
5054 .where('KEPLERID=6541920')
5055 .from("2009–05–02T00:56:10.000000Z", to="2013–05–11T12:02:06.000000Z")
5056 .timesplit(6h,100,"record")
5057 .filterByLabels('record~[2–5]')
5058 .sampleBy(2h, min, false, "none")
5059
5060trend = sample.window(mean, 5, 5)
5061
5062sub(sample,trend)
5063 .on('KEPLERID','record')
5064 .lessThan(-20.0)
5065
5066So what did we do here? First we instantiated a sample variable in which we loaded the ‘sap.flux’ raw data of one star, the 6541920. We then cleaned the series, using the timesplit function (to split the star series when there is a hole in the data with a length greater than 6h), keeping only four records. Finally, we sampled the result, keeping the minimal value of each 2-hour bucket.
5067
5068We then used this result to compute the series trend, using a moving average of 10 hours.
5069
5070To conclude, the query returns only the points less than 20 from the result of the subtraction of the trend and the sample series.
5071TSL is Open Source
5072
5073Even if our first community of users was mostly inside OVH, we’re pretty confident that TSL can be used to solve a lot of Time Series use cases.
5074
5075We are currently beta testing TSL on our OVH Metrics public platform. Furthermore, TSL is open-sourced on Github, so you can also test it on your own platforms.
5076
5077We would love to get your feedback or comments on TSL, or Time Series in general. We’re available on the OVH Metrics gitter, and you can find out more about TSL in our Beta features documentation.
5078<|endoftext|>
5079<|startoftext|>
5080
5081Kubinception and etcd
5082By Horacio Gonzalez / 2019-02-08 / etcd, Kubernetes, Kubinception, OVH Managed Kubernetes
5083
5084 Running Kubernetes over Kubernetes was a good idea for the stateless components of the control plane… but what about the etcd?
5085
5086Kubinception & etcd
5087
5088In our precedent post, we described the Kubinception architecture, how we run Kubernetes over Kubernetes for the stateless components of the customer clusters’ control planes. But what about the stateful component, the etcd?
5089
5090The need is clear: each customer cluster need to have access to an etcd to be able to store and retrieve data. All the question is where and how to deploy the etcd to make it available to each customer cluster.
5091The simplest idea is not always the good one
5092
5093The first approach would be simply following the Kubinception logic: for each customer cluster deploying an etcd cluster as pods running on the admin cluster.
5094Full Kubinception for the etcd: deploying etcd cluster as pods
5095
5096This full Kubinception approach has the merit to be simple, it seems like an extension of what we are doing with the stateless components. But when looking at it in detail, it shows its flaws. Deploying an etcd cluster is not as easy and straightforward as deploying stateless and being critical to the cluster operation, we couldn’t simply handle it manually, we needed an automated approach to manage it at a higher level.
5097Using the operator
5098
5099We weren’t the only one to think that the complexity of dealing with the deployment and operation on of an etcd cluster on Kubernetes were excessive, the people of CoreOS had noticed it and in 2016 they released an elegant solution to the problem: the etcd operator.
5100
5101An operator is a specific controller that extends the Kubernetes API to easily create, configure and operate instances of complex (often distributed) stateful applications on Kubernetes. For the record, the concept of operator was introduced by CoreOS with the etcd operator.
5102Using the etcd operator to deploy the etcd clustersUsing the etcd operator to deploy the etcd clusters
5103
5104The etcd operator manages etcd clusters deployed to Kubernetes and automates operational tasks: creation, destruction, resizing, failover, rolling upgrades, backups…
5105
5106As in the precedent solution, the etcd cluster for each customer cluster is deployed as pods in the admin cluster. By default, the etcd operator deploys the etcd cluster using local, non persistent storage for each etcd pod. That means that if all the pods die (unlikely) or are re-scheduled and spawned in another node (far more likely) we could lose the etcd data. And without it, the customer Kubernetes are bricked.
5107
5108The etcd operator can be configured to use persistent volumes (PV) to store the data, so theoretically the problem was solved. Theoretically because the volume management wasn’t mature enough when we tested it, and if an etcd pod was killed and re-scheduled, the new pod failed retrieve its data on the PV. So the risk of total quorum loss and the bricking of customer cluster was still there with the etcd operator.
5109
5110In brief, we worked quite a bit with the etcd operator, and we found it not mature enough for our use.
5111The StatefulSet
5112
5113Setting aside the operator, another solution was to use a StatefulSet, a kind of distributed Deployment well suited for managing distributed stateful applications.
5114
5115There is an official ETCD Helm chart that allows to deploy ETCD clusters as StafefulSets, that trades off some of the operator flexibility and user-friendliness for a more robust PV management that guarantees that a re-scheduled etcd pod will retrieve its data.
5116Using StatefulSets for the etcd clusters
5117
5118The etcd StatefulSet is less convenient that the etcd operator, as it doesn’t offer the easy API for operations as scaling, failover, rolling upgrades or backups. In exchange you get some real improvements in the PV management. The StatefulSet maintains a sticky identity for each of the etcd posts, and that persistent identifier is maintained across any rescheduling, making it simply to pair it to its PV.
5119
5120The system is so resilient that, even if we lose all the etcd pods, when Kubernetes re-schedule them they will find their data and the cluster will continue working without problem.
5121Persistent Volumes, latency, and a simple costs calcul
5122
5123The etcd StatefulSet seemed a good solution… until we began to use it in an intensive way. The etcd StatefulSet uses PV, i.e. network storage volumes. And etcd is rather sensible to network latency, its performance degrades heavily when faced to latency.
5124
5125Even if the latency could be kept under control (and that’s a big if), the more we thought about the idea, the more it seemed an expensive solution. For each customer cluster we would need to deploy three pods (effectively doubling the pod count) and three associated PV, it doesn’t scale well for a managed service.
5126
5127In OVH Managed Kubernetes service we bill our customers according to the number of worker nodes they consume, i.e. the control plane is free. That means that for the service to be competitive it’s important to keep under control the resources consumed by the control planes, so the need not to double the pod count with the etcd.
5128
5129With Kubinception we had tried to think outside the box, it seemed that for etcd we needed to get out of that box once again.
5130Multi-tenant etcd cluster
5131
5132If we didn’t want to deploy the etcd inside Kubernetes, the alternative was to deploy it outside. We chose to deploy a multi-tenant etcd cluster on dedicated servers. All the customer clusters would use the same ETCD, every API server getting its own space in this multi-tenant etcd cluster.
5133Multi-tenant bare-metal dedicated etcd cluster
5134
5135With this solution the resiliency is guaranteed by the usual etcd mechanisms, there is no latency problem as the data is in the local disk of each etcd node, and the pod count remains under control, so it solves the main problems we had with the other solution. The trade-off here is that we need to install and operate this external etcd cluster, and manage the access control to be sure that every API server access only to its own data.
5136What’s next?
5137
5138In the next posts in the Kubernetes series we will dive into other aspects of the construction of OVH Managed Kubernetes, and we will give the keyboard to some of our beta customers to narrate their experience using the service.
5139
5140Next week let’s focus on another topic, we will deal with TSL query language, and why did we create and open sourced it…
5141<|endoftext|>
5142<|startoftext|>
5143
5144How we’ve updated 850 vCenter in 4 weeks
5145By Antoine Ruelle / 2019-02-08 / PCC, Private Cloud, vCenter, VMware
5146
5147Release management on enterprise software isn’t an easy job: updating infrastructures, coping with the fear of not being supported by the software editor, upgrading licenses to be compatible with new versions, and taking all precautions to rollback if something isn’t working as expected…
5148
5149With OVH Private Cloud, we take away from you this complexity. We’re managing this time-costing and stressful aspect to allow you to concentrate in your business and your production.
5150
5151But, this doesn’t mean it’s not a challenge for us.
5152Upgrading vSphere
5153Upgrading hundreds of vSphere 5.5 to 6.0
5154
5155vSphere is the lead product of the Private Cloud offer, it’s part of the SDDC suite provided by VMware. vSphere is a software making the user able to manage his hosts, storage, network … Through a client, and he can create clusters with these assets for hosting a production reliable, stable, and highly available.
5156
5157Since September 2018, vSphere (vCenter, ESXi …) version 5.5 is end-of-support by VMware. Owner of the security and stability of the Private Cloud infrastructures, we started update processes for all the vCenter.
5158vmware vSphere 5.5
5159
5160We had around of 850 vCenter in 5.5 version in production, which represents a significant work to update everything, if it’s done manually. But at OVH, we have a common leitmotiv : automate all human actions for effectiveness, and avoid errors.
5161
5162That’s how we managed to update 850 vCenter from 5.5 to 6.0 version, in 4 weeks. In other words, more than 210 vCenter per week, 30 vCenter per day, with a team of 10 people following this maintenance in background, without creating any impact on customers production.
5163Migration status
5164
5165Our dev team has designed and created a set of scripts (which we call internally “robot”) to automate vCenter upgrades years ago. This robot has evolved a lot since the beginning of the Private Cloud product, and follows us from 4.1 to 6.5 version, which is a work in progress.
5166
5167We encountered lots of issues while setting up the automated actions, like database corruption, services not integrated in Single Sign-On (it was very hard to manage it in 5.0 and 5.1 version), but also thumbprint which wasn’t updated for all services, very hard to troubleshoot, and to reproduce it. We even had some operating systems which blocked upgrade softwares, making everything brutally stopped.
5168
5169Our operations teams worked a lot with VMware support team, to find workarounds to issues encountered, and automate all them with dev team. This led to VMware KBs creation to notify customers about issues we faced off, recognized as bugs by VMware. The teams spent a lot of nights to ensure a minimal impact of vSphere availability for customers.
5170Upgrading the upgrader: a new version of the robot
5171
5172All of these issues convince us to act on two things. First, push a new version of the upgrade robot, creating less errors, having a faster execution from customer point of view, more reliable and trustful. Second, we abandoned the default upgrade process, using VMware software upgrade, for a solution where we start from a freshly installed vCenter updated stack, on an updated virtual machine, and then we reconnect every components (database, NSX …) to this new vCenter.
5173
5174This greatly improved our service stability, as we ensure we have a new healthy and updated basis for the vCenter. All this has drastically reduced the number of interventions of our SREs on Private Cloud infrastructures.
5175
5176If we sum up our actions: we verify that the service is working before doing anything, then we prepare all our saves and snapshots to prepare the upgrade. Once it’s done, we deploy our automation to launch the upgrade. Every step integrate an automatic verification to be sure all actions have been done.
5177
5178We created this upgrade robot in an orchestrator robot, which, according to parameters entered, will create upgrade tasks to each Private Cloud concerned by the maintenance, and will schedule it at automatic dates, within a minimum of 72 hours of consideration for the customer, but also the number of upgrades launched by hour, and critical periods (such as Black Friday, or Winter Sales). Customers can reschedule their upgrades by using the Manager in the Operations part, to run the maintenance in a better time for their production.
5179Customers can reschedule their upgrades by using the Manager
5180
5181Our SRE teams are following the robots, and ensure that the maintenances are running as expected, at the time scheduled.
5182Our SRE teams are following the robots
5183
5184To sum up, we went from a need to automate a vCenter upgrade operation which should take at least a minimum of 12 hours per vCenter to be done, to a first version of an automation which permit to accomplish this operation in 4 hours, but with an error rate too high (20%) because of recurrent bugs, which had to be fixed manually by SREs. Now, the second version is solid, reliable and stable, avoiding known issues, and creating only rare and unique issues which will be fixed in the automation in a curated pass.
5185What’s next?
5186
5187In the next months, other maintenances will follow, host upgrades from 5.5 to 6.0 version, upgrades of our Veeam Backup option from 8.0 to 9.5 version, upgrade of our Zerto option from 5.0 to 5.5, and a lot of other upgrades of our internal machines to ensure the PCI-DSS audit routine.
5188
5189We will keep the same transparency and communication, while we are listening your feedbacks, and improving our maintenance system.
5190<|endoftext|>
5191<|startoftext|>
5192
5193DNS Flag Day, what does it change ?
5194By Guillaume Marchand / 2019-01-31 / Conformity, DNS
5195
5196On this February 1st, the DNS (Domain Name System [1]) protocol is going to undergo a new big change…
5197A bit of Context
5198
5199DNS protocol has been a key component of the functionality of the Internet for the last 30 years, and still is. It associates domain names (e.g. www.ovh.com) to the numerical IP addresses (e.g. 198.27.92.1) needed for locating and identifying websites or other computer services. This protocol is usually described as the web directory.
5200
5201As the Internet, and technologies running it, are quickly evolving, of course the DNS protocol has already evolved many times along its 30 years of existence.
5202
5203Today, we especially have a look on its first extension, called EDNS [2], which is at the heart of the so-called DNS Flag Day.
5204EDNS, What’s this thing ?
5205
5206This extension added new functionalities to the ones bring by the DNS protocol.
5207
5208Ten years ago, this extension was key to give birth to the DNSSEC [3] which is solving some security issues around DNS protocol by securing certain kinds of information provided by the DNS through cryptographically signed responses.
5209
5210Unfortunately, many DNS servers in the world don’t have this EDNS extension. Sometimes, the extension doesn’t correctly comply with the standards, or, even worst, is simply blocked !
5211
5212To guarantee the stability of the domain names resolution (i.e. the translation of a domain name into an IP address), resolver’s infrastructures had to heap up numerous modifications to manage all known exceptions.
52132019 February 1st – Day one
5214
5215These exceptions degrade significantly Domain Names resolution, and therefore directly the user experience. Moreover, it’s complicated to maintain so many patches over time.
5216
5217For all these reasons, the DNS Flag Day has been created. From the first day of 2019 February, exceptions implemented in the resolvers will progressively be removed.
5218
5219You will probably not notice much difference on D-day, but as updates are made to the DNS servers, resolutions may be compromised.
5220Who will be impacted?
5221
5222OVH infrastructures are compatible with EDNS, no impact is to be expected if you use the DNS services managed by OVH.
5223
5224If your DNS zone is not hosted on OVH DNS, we recommend you to ensure your service provider has done the necessary.
5225
5226In case you are not able to be ready by February 1st, you still have the possibility to migrate your DNS zone on our infrastructure.
5227
5228Our guides:
5229
5230 Editing the DNS servers for an OVH domain name
5231 Editing an OVH DNS zone
5232
5233Am I being impacted?
5234
5235The easier way for you to be sure is by checking if your domain name is compatible via the tools provided by DNSFlagDay. An online tool is available :DNS Flag Day is a cross organization effort and can be trusted.
5236To go further
5237
5238The .cz extension registry has put online a tool to scan any extension and check its compatibility with resolution using EDNS:
5239
5240The AFNIC has carried out a test for the .fr TLD. In their results, available here, we see that 3.49% of .fr domains will probably be impacted.
5241
5242 [1] DNS, RFC1034 & RFC1035
5243 [2] EDNS, RFC2671 & RFC6891
5244 [3] DNSSEC, RFC4033
5245<|endoftext|>
5246<|startoftext|>
5247
5248Handling OVH’s alerts with Apache Flink
5249By Pierre Zemb / 2019-01-31 / Alerting, Apache Flink, Metrics, Omni
5250
5251OVH relies extensively on metrics to effectively monitor its entire stack. Whether they are low-level or business centric, they allow teams to gain insight into how our services are operating on a daily basis. The need to store millions of datapoints per second has produced the need to create a dedicated team to build a operate a product to handle that load: Metrics Data Platform. By relying on Apache Hbase, Apache Kafka and Warp 10, we succeeded in creating a fully distributed platform that is handling all our metrics… and yours!
5252
5253After building the platform to deal with all those metrics, our next challenge was to build one of the most needed feature for Metrics: the Alerting.
5254OVH & Apache Flink
5255Meet OMNI, our alerting layer
5256
5257OMNI is our code name for a fully distributed, as-code, alerting system that we developed on top of Metrics. It is split into components:
5258
5259 The management part, taking your alerts definitions defined in a Git repository, and represent them as continuous queries,
5260 The query executor, scheduling your queries in a distributed way.
5261
5262The query executor is pushing the query results into Kafka, ready to be handled! We now need to perform all the tasks that an alerting system does:
5263
5264 Handling alerts deduplication and grouping, to avoid alert fatigue.
5265 Handling escalation steps, acknowledgement or snooze.
5266 Notify the end user, through differents channels: SMS, mail, Push notifications, …
5267
5268To handle that, we looked at open-source projects, such as Prometheus AlertManager, LinkedIn Iris, we discovered the hidden truth:
5269
5270 Handling alerts as streams of data,
5271 moving from operators to another.
5272
5273We embraced it, and decided to leverage Apache Flink to create Beacon. In the next section we are going to describe the architecture of Beacon, and how we built and operate it.
5274
5275If you want some more information on Apache Flink, we suggest to read the introduction article on the official website: What is Apache Flink?
5276Beacon architecture
5277
5278At his core, Beacon is reading events from Kafka. Everything is represented as a message, from alerts to aggregations rules, snooze orders and so on. The pipeline is divided into two branches:
5279
5280 One that is running the aggregations, and triggering notifications based on customer’s rules.
5281 One that is handling the escalation steps.
5282
5283Then everything is merged to generate a notification, that is going to be forward to the right person. A notification message is pushed into Kafka, that will be consumed by another component called beacon-notifier.
5284Beacon architecture
5285
5286If you are new to streaming architecture, I recommend reading Dataflow Programming Model from Flink official documentation.
5287Handling state
5288
5289Everything is merged into a dataStream, partitionned (keyed by in Flink API) by users. Here’s an example:
5290
5291final DataStream<Tuple4<PlanIdentifier, Alert, Plan, Operation>> alertStream =
5292
5293 // Partitioning Stream per AlertIdentifier
5294 cleanedAlertsStream.keyBy(0)
5295 // Applying a Map Operation which is setting since when an alert is triggered
5296 .map(new SetSinceOnSelector())
5297 .name("setting-since-on-selector").uid("setting-since-on-selector")
5298
5299 // Partitioning again Stream per AlertIdentifier
5300 .keyBy(0)
5301 // Applying another Map Operation which is setting State and Trend
5302 .map(new SetStateAndTrend())
5303 .name("setting-state").uid("setting-state");
5304
5305 SetSinceOnSelector, which is setting since when the alert is triggered
5306 SetStateAndTrend, which is setting the state (ONGOING, RECOVERY or OK) and the trend(do we have more or less metrics in errors).
5307
5308Each of this class is under 120 lines of codes because Flink is handling all the difficulties. Most of the pipeline are only composed of classic transformations such as Map, FlatMap, Reduce, including their Rich and Keyed version. We have a few Process Functions, which are very handy to develop, for example, the escalation timer.
5309Integration tests
5310
5311As the number of classes was growing, we needed to test our pipeline. Because it is only wired to Kafka, we wrapped consumer and producer to create what we call scenari: a series of integration tests running different scenarios.
5312Queryable state
5313
5314One killer feature of Apache Flink is the capabilities of querying the internal state of an operator. Even if it is a beta feature, it allows us the get the current state of the different parts of the job:
5315
5316 at which escalation steps are we on
5317 is it snoozed or ack-ed
5318 Which alert is ongoing
5319 and so on.
5320
5321Queryable state overviewQueryable state overview
5322
5323Thanks to this, we easily developed an API over the queryable state, that is powering our alerting view in Metrics Studio, our codename for the Web UI of the Metrics Data Platform.
5324Apache Flink deployment
5325
5326 We deployed the latest version of Flink (1.7.1 at the time of writing) directly on bare metal servers with a dedicated Zookeeper’s cluster using Ansible. Operating Flink has been a really nice surprise for us, with clear documentation and configuration, and an impressive resilience. We are capable of rebooting the whole Flink cluster, and the job is restarting at his last saved state, like nothing happened.
5327
5328We are using RockDB as a state backend, backed by OpenStack Swift storage provided by OVH Public Cloud.
5329
5330For monitoring, we are relying on Prometheus Exporter with Beamium to gain observability over job’s health.
5331In short, we love Apache Flink!
5332
5333If you are used to work with stream related software, you may have realized that we did not used any rocket science or tricks. We may be relying on basics streaming features offered by Apache Flink, but they allowed us to tackle many business and scalability problems with ease.
5334Apache Flink
5335
5336As such, we highly recommend that any developers should have a look to Apache Flink. I encourage you to go through Apache Flink Training, written by Data Artisans. Furthermore, the community has put a lot of effort to easily deploy Apache Flink to Kubernetes, so you can easily try Flink using our Managed Kubernetes!
5337What’s next?
5338
5339Next week we come back to Kubernetes, as we will expose how we deal with ETCD in our OVH Managed Kubernetes service.
5340<|endoftext|>
5341<|startoftext|>
5342
5343Kubinception: using Kubernetes to run Kubernetes
5344By Horacio Gonzalez / 2019-01-25 / Kubernetes, Kubinception, OVH Managed Kubernetes
5345
5346 When faced with the challenge of building a managed Kubernetes service at OVH, fully based on open-source tools, we had to take some tough design decisions. Today we review one of them…
5347
5348Kubinception
5349
5350One of the most structural choices we made while building OVH Managed Kubernetes service was to deploy our customers’ clusters over our own ones. Kubinception indeed…
5351
5352In this post we are relating our experience running Kubernetes over Kubernetes, with hundreds of customers’ clusters. Why did we choose this architecture? What are the main stakes with such a design? What problems did we encounter? How did we deal with those issues? And, even more important, if we had to take the decision today, would we choose again to do the Kubinception?
5353How does a Kubernetes cluster work?
5354
5355To fully understand why we run Kubernetes on Kubernetes, we need at least a basic understanding of how a Kubernetes cluster works. A full explanation on this topic is out of the context of this post, but let’s do a quick summary:
5356
5357A working Kubernetes cluster is composed of:
5358
5359 A control plane that makes global decisions about the cluster, and detects and responds to cluster events. This control plane is composed of several master components.
5360 A set of nodes, worker instances containing the services necessary to run pods, with some node components running on every node, maintaining running pods and providing the Kubernetes runtime environment.
5361
5362Simplified Kubernetes architectureSimplified Kubernetes architecture
5363Master components
5364
5365In this category of components we have:
5366
5367 API Server: exposes the Kubernetes API. It is the entry-point for the Kubernetes control plane.
5368 Scheduler: watches newly created pods and selects a node for them to run on, managing ressource allocation.
5369 Controller-manager: run controllers, control loops that watch the state of the cluster and move it towards the desired state.
5370 ETCD: Consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data. This topic deserves its own blog post, so we speaking about it in the coming weeks.
5371
5372Node components
5373
5374In every node we have:
5375
5376 Kubelet: agent that makes sure that containers described in PodSpecs are running and healthy. It’s the link between the node and the control plane.
5377 Kube-proxy: network proxy running in each node, enabling the Kubernetes service abstraction by maintaining network rules and performing connection forwarding.
5378
5379Our goal: quick and painless cluster deployment
5380
5381How come did we go from this that simple Kubernetes architecture to a Kubernetes over Kubernetes one? The answer lies on one of our our main goals building OVH Managed Kubernetes service: to be able to deploy clusters in the simplest and most automated way.
5382
5383And we didn’t only want to deploy clusters, we wanted the deployed clusters to be:
5384
5385 Resilient
5386 Isolated
5387 Cost-optimized
5388
5389Kubinception: running Kubernetes over Kubernetes
5390
5391The idea is to use a Kubernetes cluster we are calling admin cluster to deploy customer clusters.
5392
5393As every Kubernetes cluster, the customer clusters have a set of nodes and a control plane, composed of several master components (API server, scheduler…).
5394
5395What we are doing is to deploy those customer cluster master components as pods in the admin cluster nodes.
5396Simplified Kubinception architectureSimplified Kubinception architecture
5397
5398So now we have the stateless components of the customer cluster control plane running as pods in the admin cluster nodes. We haven’t spoken about the ETCD, as we will speak about it in a next post, for the moment let’s only say that is it a dedicated component, living outside Kubernetes.
5399
5400And the customer cluster worker nodes? They are normal Kubernetes nodes: OVH public cloud instances connecting to the customer cluster API server running in an admin cluster pod.
5401Customer cluster with nodes and ETCDCustomer cluster with nodes and ETCD
5402
5403Our goal is to manage lots of cluster, not only one, so how can we add another customer cluster? As you could expect, we deploy the new customer cluster control plane on the admin cluster nodes.
5404Two customer clusters on KubinceptionTwo customer clusters on Kubinception
5405
5406From the admin cluster point of view, we have simple deployed three new pods. Then we spawn some new node instances and connect an ETCD and the cluster is up.
5407If something can fail, it will do it
5408
5409We have now an architecture that allows us to quickly deploy new clusters, but if we go back to our goal, quickly deployment was only half of it, we wanted the cluster to be resilient. Let’s begin with the resiliency.
5410
5411The customer cluster nodes are already resilient, as they are vanilla Kubernetes nodes, and the ETCD resiliency will be detailed in a specific blog post, so let’s see the control plane resiliency, as it’s the specific part of our architecture.
5412
5413And that’s the beauty of the Kubinception architecture, we are deploying the customer clusters control plane as simple, standard, vanilla pods in our admin cluster. And that means they are as resilient as any other Kubernetes pod: if one of the customer cluster master components goes down, the controller-manager of the admin cluster will detect it and the pod will be rescheduled and redeployed, without any manual action on our side.
5414What a better way to be sure that our Kubernetes is solid enough…
5415
5416Basing our Managed Kubernetes service on Kubernetes made us to stumble on facets of Kubernetes we hadn’t found before, to learn lots of things about installing, deploying and operating Kubernetes. And all those knowledge and tooling was directly applied to our customers clusters, making the experience better for everybody.
5417And what about scaling
5418
5419The whole system has been designed from the ground up with this idea of scaling The Kubernetes over Kubernetes architecture allows easy horizontal scaling. When an admin cluster begins to get too big, we can simply spawn a new one and deploy there the next customer control planes.
5420What’s next?
5421
5422As this post is already long enough, so I leave the explanation on the ETCD for the next post in the series, in two weeks.
5423
5424Next week let’s focus on another topic, we will deal with Apache Flink and how do we use it for handling alerts at OVH-scale…
5425<|endoftext|>
5426<|startoftext|>
5427
5428Why OVH Managed Kubernetes?
5429By Horacio Gonzalez / 2019-01-17 / Kubernetes, OVH Managed Kubernetes, OVH Platform
5430
5431 Using Kubernetes is a great experience, operating it in production is way less simple. And building a managed Kubernetes platform is even worse…
5432
5433In November 2018 we released the beta versi on of our Managed Kubernetes service. It was the outcome of a journey that took us from Kubernetes users to build a fully managed Kubernetes service, become a certified Kubernetes platform and learn lots of things about building, operating and taming Kubernetes at scale…
5434
5435As the beta is running now, and the lasts issues are being worked out for the final release, we take some time to share some of the lessons we have learnt, the technological choices that we have taken and the tooling we have built in the process.
5436
5437In today’s post we will introduce our Managed Kubernetes, explaining why do we built it. In next posts we will look deeper at some aspects of the architecture, like scaling the ETCD or how we start customers Kubernetes masters inside our master Kubernetes’ worker nodes…
5438
5439And of course, if you want to know more about our Managed Kubernetes, if you would like to see a post about a particular matter, don’t hesitate to leave a comment!
5440The Kubernetes journey
5441
5442The first time you play with Minikube if often astonishing. No more worrying about managing the instances, no need to monitor if the containers are running, you stop an instance and Kubernetes re-creates the containers in another instance… It’s a kind of magic!
5443
5444Then, as a new believer, you tell yourself that you should try to build a true cluster, and deploy some bigger apps on it. You create some VM, you learn to use kubeadm and some time later you have spawned a fresh Kubernetes cluster to deploy your apps on. The magic is still there, but you begin to feel that, like in most tales, magic comes with a price…
5445Putting Kubernetes in production?
5446
5447And when you try to deploy your first production Kubernetes cluster on-premises, on a hypervisor or bare-metal platform, you discover than the price could be a bit steep…
5448
5449Deploying the Kubernetes cluster is only the beginning, in order to considerer it prod ready, you also need to ensure that:
5450
5451 The installation process is automatisable and repeatable
5452 The upgrade/rollback process is safe
5453 A recovery procedure exists, is documented and tested
5454 Performance is predictable and consistent, specially when using persistent volumes
5455 The cluster is operable, with enough traces, metrics and logs to detect and debug failures and problems
5456 The service is secure and high-available
5457
5458Our answer to this operational complexity
5459
5460Well, if you though deploying your new Kubernetes cluster was going to give you this whole NoOps thing, it seems you were wrong. Keeping the magic metaphor, learning to master magic takes a long time and it’s not without risk…
5461
5462So, as many powerful technologies, Kubernetes apparent simplicity and versatility in the Dev side comes with a high complexity in the Ops side. No wonder that most users looks at the managed Kubernetes front when they need to upgrade from proof-of-concept to production.
5463
5464At OVH, as a user-focused company, we wanted to answer that demand by creating our managed Kubernetes solution, fully based on open source, without vendor-locking, fully compatible with any pure Kubernetes solution. Our objective was to give our users a fully managed turnkey Kubernetes cluster, ready to use, without the hassle of installation or operation.
5465On the shoulders of giants…
5466
5467So we wanted to build a managed Kubernetes solution, but how to do it? The first step was simple: we needed to be sure that the underlying infrastructure was rock solid, so we decided to base it on our own, OpenStack based, Public Cloud offer.
5468Certified Kubernetes Hosting
5469
5470Building our platform over a mature, high available, standards based product as OVH Public Cloud, allowed us to concentrate our efforts in the real problem we had in hand: creating a highly scalable, easy to operate, CNCF certified, managed Kubernetes service.
5471What’s next?
5472
5473In the next posts in the series, we are going to dive into the architecture of OVH Managed Kubernetes service, detailing some of our technological choices, explaining why we took them and how we made it work.
5474
5475We will begin with one of our boldest decisions: running Kubernetes over Kubernetes, or as we like to call it, the Kubinception.
5476<|endoftext|>