· 6 years ago · Feb 27, 2020, 06:36 PM
1I'm actually seriously looking up how to manage a multi-user database... I think that's a bit nuts, but I'm not sure how else to manage the records on our end of things
2I could easily set up a single-user database for it but who wants to be responsible for entering ALL of that data?
3...though, honestly, if I did a mass import of the spreadsheets I have and enter data as I move things into place, I'd have a pretty thorough set of info just with that
4the mass import would fill in so much data for all the groups, the only tricky bit would be matching up categories with group names, because categories would need their own table, and the primary key for that table would be the foreign key (or whatever it's called, I'm blanking on the name now) in the groups table
5and you'd also need a fandoms table, and that would take the most time, I think - but not impossible, as many categories would make it easy to fill out (I'm not sure whether I'd want to prefill any - the work it takes could be a bit, would be about as complicated as with categories - and we'd get some bad data, with Yahoo's miscategorization of some things - the mix-up with the one Christian music group / anime show with the same name comes to mind)
6and then there'd be a set of checkboxes for which data we have for the group - description, GMD, FPL, messages/topics (from the Python script), PGO, Other (such as the files donated for HPRecycleBin)...
7and a radio toggle for group status (at the time the group came on our radar - I don't care if they decided afterwards to go private or whatnot, lol), public/restricted/private/dead
8oh yeah, definitely can't prefill based on categories unless it's only specific ones
9for instance, I'm pretty sure that groups under JAG were actually for JAG
10but there's not much you can determine from a category A, for instance, lol
11or Other
12so if we mapped a list of categories to fandoms... that could save a good deal of work if it could be automated
13which I'm sure it can
14Doranwen02/18/2020
15I don't know, I just ran it past a friend of mine who knows databases, and they had some good thoughts but I'm too tired to make sense of them all
16so another day, lol
17
18ArcadianMaggie02/18/2020
19I will remind you that I have created very big complicated relationship databases in the past. :D
20The thing is that we need to start with what we want the database for and what we want it to do. If it's simply to sort to store the groups, it doesn't need to be very complicated and could very well remain in sheets. If it's for future searching, then we need to decide how we want to deal with categories.
21we could simply have a "multifandom" category. Or we could have a primary fandom identified and a multifandom secondary cagtegory. And I need to talk to Morgan in more detail about how Fanlore will use the data.
22They've had preliminary discussions but we need to know the desired output before we start making structural decisions about the database itself. because we only want to do this stuff once! lol
23
24Doranwen02/18/2020
25lol yes
26I'm just brainstorming
27(and running out the door, lol)
28
29ArcadianMaggie02/18/2020
30lol. I know. And I know I put you off yesterday when you brought it up. I was just so tired (and still sort of sick).
31Doranwen02/18/2020
32it's fine
33my brain just doesn't want to quit now when it's getting to the interesting part, lol
34ArcadianMaggie02/18/2020
35I just don't want you to get started on something we'll have to change
36
37Doranwen02/18/2020
38yeah
39
40ArcadianMaggie02/18/2020
41I always start with output in my brainstorming. I decide what I need it to do then I make it do that
42
43Doranwen02/18/2020
44I know I'll want to be able to esarch stuff
45and the stats-interested person in me wants to be able to, say, generate a list of all LOTR-related groups
46or at least the numbers, lol
47anyway, out the door I go!
48
49ArcadianMaggie02/18/2020
50later!!
51but, you can do both of those things in sheets.
52
53Doranwen02/18/2020
54not with multifandom groups
55I don't want them all lumped as "multifandom"
56
57ArcadianMaggie02/18/2020
58the database is appealing because of the volume of data, but not many fields are one to many, tbh
59
60Doranwen02/18/2020
61anyway
62nods
63zooms
64
65ArcadianMaggie02/18/2020
66byeeeee
67
68Doranwen02/18/2020
69remoting in, lol - I think it would have to be a database just because of the volume of data
70I do not want to deal with a sluggish spreadsheet or crashing my office programs because of too many rows or whatnot
71
72ArcadianMaggie02/18/2020
73more than likely, especially once we add the description sin
74
75Doranwen02/18/2020
76yes
77
78ArcadianMaggie02/18/2020
79but I don't think it needs to be complicated.
80
81Doranwen02/18/2020
82I don't think it would be
83
84ArcadianMaggie02/18/2020
85The only 2 join tables would probably be status and category/fandom
86
87Doranwen02/18/2020
88category and fandom aren't the same things
89and while we can use the category to populate some of the fandoms
90definiely not all
91*definitely
92
93ArcadianMaggie02/18/2020
94true, but do we need category and fandom?
95
96Doranwen02/18/2020
97well, category is in the spreadsheet already
98
99ArcadianMaggie02/18/2020
100Yeah, but they're not very accurate
101
102Doranwen02/18/2020
103I'd like to import it - if for nothing else than to have a record of Yahoo's weird categories
104lol true
105but it's information on how it was categorized then
106which I'd like to have
107
108ArcadianMaggie02/18/2020
109OK, but I don't think it's really useable for our purposes other than historical value
110
111Doranwen02/18/2020
112but I'm not sure whether it needs to be its own table or not
113nods
114
115ArcadianMaggie02/18/2020
116well, it's a one to many so it probably should be
117
118Doranwen02/18/2020
119yes, that's what I thought
120but how to populate that from the spreadsheets....
121
122ArcadianMaggie02/18/2020
123ideally, a database would only have the information once, which is one of the benefits
124
125Doranwen02/18/2020
126right
127that's why I came up with the structure I did - and then went "oof, how do you get the info from the spreadsheets in that way"
128though you have a good point about the status needing to be its own table
129
130ArcadianMaggie02/18/2020
131Well, I have a lot of experience going from spreadsheets to databases
132
133Doranwen02/18/2020
134good :)
135
136ArcadianMaggie02/18/2020
137lol
138basically, we have a ton of manual work to do first to get things into the proper fandom.
139
140Doranwen02/18/2020
141I was looking at the spreadsheet and trying to figure out what data we'd actually want to keep - and for some reason I did not import URLs into all the full spreadsheets, I have the prefixes and intlcode and all that, but not the URL
142how are we going to do that beforehand?
143
144ArcadianMaggie02/18/2020
145well, I have the full master of the groups we actually have.
146
147Doranwen02/18/2020
148a spreadsheet doesn't let us mark a group as multiple fandoms
149
150ArcadianMaggie02/18/2020
151And their URLs already
152
153Doranwen02/18/2020
154other than Multifandom which makes it impossible to find under any of the individual ones
155
156ArcadianMaggie02/18/2020
157you just create columns for fandom1 fandom2 fandom3 and decide if yo'ure going to limit or not
158or have multifandom after, say, 3 or 4 individual ones
159
160Doranwen02/18/2020
161there may not be a ton of groups like that, but as one of them has one of my fandoms in it, I realize I would def. want to be able to include said group with a roundup of that fandom's groups
162hmm... my spreadsheet has all groups, not just ones we have
163
164ArcadianMaggie02/18/2020
165right. so while compiling the data, do it in columns and then those columns will turn into the join table for fandom
166
167Doranwen02/18/2020
168including the dead ones
169
170ArcadianMaggie02/18/2020
171OK, I have the dead too but I guess how much work are we going to ask people to do with the dead groups?
172
173Doranwen02/18/2020
174that's why the last spreadsheet is well over 200k groups
175
176ArcadianMaggie02/18/2020
177I am thinking volunteer hours
178
179Doranwen02/18/2020
180well, dead is easy to import
181you don't have to do anything
182you have group name and status
183
184ArcadianMaggie02/18/2020
185and what we can ask people to do
186
187Doranwen02/18/2020
188and it's done automatically upon import
189and that's it
190
191ArcadianMaggie02/18/2020
192yes, but if we care about fandom?
193
194Doranwen02/18/2020
195you can only guess on those
196
197ArcadianMaggie02/18/2020
198all fandoms are going to have to be manually check
199
200Doranwen02/18/2020
201some are obvious but some you won't know at all
202yeah
203I'd de-prioritize the dead stuff, but I'd like to have it in there
204if we have time to enter data for it
205like some of them are in the big jsons Pink and I were searching
206and they went dead after that
207so we can get descriptions and category and fandom for them
208the ones that came in the GMD links, one would have to hunt down which they came from and the way I processed them... it would take ages to figure out which folder to even look at, much less search within
209
210ArcadianMaggie02/18/2020
211OK. well, I have a dead list (which I should compare with yours), and I have a full master of groups we do have, broken down by ID, and I have a missing list
212(and a non-fandom, DNJ list too).
213
214Doranwen02/18/2020
215might have better luck with some command-based searching, but isn't an expert at that yet
216
217ArcadianMaggie02/18/2020
218For volunteers, I'd probably want to ask them to work by ID since most of those are grouped by tabs, which have some sort of structure to them
219
220Doranwen02/18/2020
221basically, every group that crossed my path is in the spreadsheets I have
222other than the last sets from our latest IDs
223...and some are probably dupes
224
225ArcadianMaggie02/18/2020
226Right. And we need to decide on a list of fandom and consistent spelling so when different volunteers are filling them in, they're choosing from the same list
227
228Doranwen02/18/2020
229as long as we require unique group names somehow (without making that the primary key because I don't think that's a good idea, lol)
230yeah
231like Tolkien vs. LOTR
232prefers Tolkien
233
234ArcadianMaggie02/18/2020
235Yes
236
237Doranwen02/18/2020
238the Silmarillion stuff lumps in better that way
239
240ArcadianMaggie02/18/2020
241So that is months of work, most likely
242
243Doranwen02/18/2020
244a lot of book stuff will be better grouped by author than by world
245yeah
246but I can start organizing the digital hdd data, right?
247lol
248
249ArcadianMaggie02/18/2020
250And do the actors go in tolkien or tokien RPF, etc. We might want to model after AO3 fandom categories as much as possible
251
252Doranwen02/18/2020
253like at least having letters of the alphabet and putting groups under that
254sighs at remote access and its sluggishness when typing, making it difficult to correct typos in time
255I don't think we're going to want to put anything under RPF categories specifically - because most of the groups weren't RPFish anyway
256
257ArcadianMaggie02/18/2020
258Oh, I've got to run. I've got an appointment
259
260Doranwen02/18/2020
261a lot of fan groups, but not necessarily fic
262
263ArcadianMaggie02/18/2020
264actors are all RPF
265
266Doranwen02/18/2020
267k, I need to work anyhow
268but the F = Fiction
269there isn't any for most of these
270
271ArcadianMaggie02/18/2020
272???
273
274Doranwen02/18/2020
275I think that's where it breaks down, this isn't AO3 and much of the groups aren't necessarily full of transformative works for actors
276thinks Celebrities works better as a broad category
277
278ArcadianMaggie02/18/2020
279I think that's far too broad
280
281Doranwen02/18/2020
282yeah, I'd want to narrow down
283but Adult_Viggo is a rarity in that it actually is RPF
284most weren't
285
286ArcadianMaggie02/18/2020
287there is a lot of RPF in there
288Tons
289
290Doranwen02/18/2020
291hmm
292
293ArcadianMaggie02/18/2020
294all the boybands, etc
295
296Doranwen02/18/2020
297I just don't like a category name that doesn't work for all the groups
298even if most fit
299we'll think more
300go to your appointment!
301will go to grading papers
302maybe just Actors, Musicians, Athletes, etc.
303leave off the RPF bit
304that would encompass RPF but not be exclusively so
305(especially given the number of people who don't know what RPF is or get mixed up about it - soooooo many times people tag for RPF when it isn't, lol)
306(but someone would easily go "oh, that's a musician" and mark that fandom)
307
308ArcadianMaggie02/18/2020
309well, I definitely think musicians would be fandom. :D
310So I'm talking about fandom, not category, though we might want to have our own category and fandom. But say, I write a 1D fic (which, I have; don't judge).
311Then I might want to go find all the yahoo groups that talk about Harry Styles. So that's why I think the AO3 categories might be useful for our purposes.
312They have specific RPF fandoms for bands and musicians and then a broader Music RPF, US Actor RPF for randos who aren't big enough for their own fandoms
313Anyway, I think Morgan definitely needs to be in on these discussions of how we're organizing and categorizing the data and I do want to have all this ironed out before we start recruiting volunteers to help deal with this most massive project
314
315Doranwen02/18/2020
316definitely
317I'm just the kind of person that the more I talk about it the better I can think about it, lol
318and my brain keeps trying to think about it
319
320ArcadianMaggie02/18/2020
321and just as an aside on AO3, works don'thave to be actual fic to go in the RPF categories. that's just a broad umbrella they use to differentiate between real people and fictional
322
323Doranwen02/18/2020
324but I think I need to organize digital files first - group name folder, then under that a folder for each type of info on the group, I'm thinking?
325ahh
326all under the alphabet letter for the group
327and maybe even further divisions if it gets too unwieldy for file exploring programs
328which it will
329because 200,000 groups isn't a pittance, lol
330
331ArcadianMaggie02/18/2020
332it's going to be more than 200K probably because we sent a lot to the AT to run.
333
334Doranwen02/18/2020
335and my file explorer gets unhappy with much more than 1000 files or folders selected at once
336yeah
337
338ArcadianMaggie02/18/2020
339there were a lot of tabs on v2 that we don't have, but were in the AT's list
340
341Doranwen02/18/2020
342though will we get them from them? that's the one thing I'm concerned about
343our data is flowing to them but I haven't had assurance of a confirmed data flowing the opposite way
344
345ArcadianMaggie02/18/2020
346didn't you download a pile of groups?
347
348Doranwen02/18/2020
349and they're going to put it all up on the IA but that may take a bunch of time
350
351ArcadianMaggie02/18/2020
352IDs I mean
353
354Doranwen02/18/2020
355yeah, but only a fraction
356like, they have way more we don't have
357and we can't possibly d/l all of their stuff
358the sheer volume of dat...
359*data
360
361ArcadianMaggie02/18/2020
362OK, well, we need to see if they will share all their GMDs.
363So far it's about the same size as ours
364
365Doranwen02/18/2020
366my 4 TB drive isn't big enough, I don't think
367really?
368
369ArcadianMaggie02/18/2020
370well, I think we're over 500K total now.
371So 1.5 times ours
372
373Doranwen02/18/2020
374nods
375so possibly doable, then
376
377ArcadianMaggie02/18/2020
378Yeah.
379
380Doranwen02/18/2020
381they probably will
382
383ArcadianMaggie02/18/2020
384I hope so
385
386Doranwen02/18/2020
387we've been working so closely together lately
388
389ArcadianMaggie02/18/2020
390and atphoenix is local. I could always let him borrow my HDD for a day. maybe.
391
392Doranwen02/18/2020
393that's right, he mentioned that the other day and I remembered you saying something about that
394
395ArcadianMaggie02/18/2020
396It would be better to have full sets in multiple places, tbh
397
398Doranwen02/18/2020
399yes
400
401ArcadianMaggie02/18/2020
402meanwhile, I'm still joining groups. LOL
403has anything happened in the yahoosuckz room? I never got an invite
404
405Doranwen02/18/2020
406they said you can join
407did you try?
408[16:58] <atphoenix> hook54321, can you see if Maggie is registered correctly so you can invite her?
409[17:02] <JAA> atphoenix: Maggie can join already.
410[17:09] <atphoenix> okay, wasn't sure.
411
412ArcadianMaggie02/18/2020
413It said I needed an invitation and now it's not visible to me anymore
414
415Doranwen02/18/2020
416just type /join #yahoosuckz
417and see if it lets you
418that's how you join any channel you want to
419
420ArcadianMaggie02/18/2020
421OH. Yes, it did. LOL
422I am soooo out of practice
423
424Doranwen02/18/2020
425including ones that don't exist, yet
426lol
427you can create them that way :)
428
429ArcadianMaggie02/18/2020
430I don't remember any of the IRC commands
431
432Doranwen02/18/2020
433well, now you have one back
434
435ArcadianMaggie02/18/2020
436lol
437
438Doranwen02/18/2020
439that was literally the last thing said, though
440yesterday
441
442ArcadianMaggie02/18/2020
443OK
444
445Doranwen02/18/2020
446so you didn't miss anything besides Morgan greeting me and that
447they like to wait till everyone is there before talking
448I don't see lennier1 yet
449so I suspect we'll wait to discuss till he gets in there
450lennier1 was all "I'd like to share everything"
451so I suspect that's how it'll go
452
453ArcadianMaggie02/18/2020
454ok, good
455
456Doranwen02/18/2020
457"Do you want access to the original zips? Or we've been talking about writing a program to split them into groups."
458
459ArcadianMaggie02/18/2020
460original zips, I think
461
462Doranwen02/18/2020
463I'm torn - it's easier to transfer big zips around than lots of little ones
464yeah
465we'll extract and move our own
466shall we ask Morgan to confirm on that so we're all in agreement?
467
468ArcadianMaggie02/18/2020
469Yes
470
471Doranwen02/18/2020
472still waiting on her response - and lennier1 I've asked to give us an estimate when he knows roughly how much the AT data is in all
473I don't have any idea how big a hdd I'd need, otherwise
4746 tb? 8 tb?
475
476
477
478
479DoranwenYesterday
480...I just had an idea
481you know how I got a 12 TB drive but the fandom is probably only 4 TB...
482what if I do it both ways on that drive?
483the sorting one
484
485ArcadianMaggieYesterday at 5:06 PM
486you mean alphabetically and by fandom?
487
488DoranwenYesterday at 5:06 PM
489start with the descriptions, get those synced, then don't sync anything until we get to the fandom stage
490yes
491it has to start with alphabetical
492that's the only thing I can rely on at the beginning
493
494ArcadianMaggieYesterday at 5:07 PM
495Ah. yeah.
496
497DoranwenYesterday at 5:07 PM
498but when all the data is in place, then copy it to the correct fandom
499that drive would hold both
500and as long as you get your descriptions early on, then you don't need to sync till it's done
501as I'll just be running regular backups with my local one
502once the alphabetical is done, then I can start going through
503but I think the work plans for going through stuff will need to be alphabetical as a result
504like, "take this list of 200 groups that all start with "abi" and identify which are fandom (and which fandom)
505
506ArcadianMaggieYesterday at 5:09 PM
507what do you mean work plans for going through stuff?
508
509DoranwenYesterday at 5:09 PM
510if you farm any of it out to others
511that is doable, actually
512
513ArcadianMaggieYesterday at 5:09 PM
514Ah, well, I thought we'd go by IDs since so many of them were fandom-grouped already
515
516DoranwenYesterday at 5:10 PM
517that makes sense from that point of view - but not from the physical "moving the data" pov, lol
518would be very difficult to keep track of what's been gone through without some fancy spreadsheeting
519
520ArcadianMaggieYesterday at 5:10 PM
521sure, which is why I said I thought the spreadsheet stuff should be done before the actual physical sorting
522that's what I meant yesterday
523
524DoranwenYesterday at 5:11 PM
525ah
526the descriptions aren't ID-sorted, though
527so that's the one sticking point
528the only way I can organize those is alphabetical
529
530ArcadianMaggieYesterday at 5:12 PM
531Yeah, but if we get those in a DB we can attach them to the right group
532
533DoranwenYesterday at 5:12 PM
534in the database, yes, but it still physically has to be moved there
535will probably be spending a few hours just running commands to move blocks of descriptions at a time
536
537ArcadianMaggieYesterday at 5:13 PM
538what format are the descriptions in?
539
540DoranwenYesterday at 5:13 PM
541I'm going to have to go at least two letters deep just to reduce it to manageable numbers per folder
542every group description has the main group folder (named with the group name), an archive.log under it (which has the record of retrieving it), and a subfolder called "about" that actually has the about.json and statistics.json in it
543so it's tons and tons of folders with identically-named jsons underneath them
544
545ArcadianMaggieYesterday at 5:15 PM
546what is in the statistics.json/
547
548DoranwenYesterday at 5:15 PM
549lemme go look again
550(fortunately I have some descriptions much easier to access than those folders!)
551mm, hard to describe, why don't I just do a paste of an about.json and a statistics.json so you can see
552it's basically like HTML code
553but easier to read in some ways
554see for yourself:
555
556statistics: https://paste.ee/p/cY67d
557about: https://paste.ee/p/OaPAO
558Paste.ee
559View Paste cY67d
560Paste.ee - View Paste cY67d
561
562Paste.ee
563View Paste OaPAO
564Paste.ee - View Paste OaPAO
565
566the statistics has the member numbers
567the about does not
568thuban's checker talks to the about API
569that's why it doesn't have member numbers
570
571ArcadianMaggieYesterday at 5:19 PM
572OK, great.
573
574DoranwenYesterday at 5:20 PM
575all of the descriptions are currently going into one very very stuffed folder on my hard drive, lol
576is up to nearly 685,000 folders of group descriptions at this point
577
578ArcadianMaggieYesterday at 5:20 PM
579so when you say description, you mean the about json?
580
581DoranwenYesterday at 5:20 PM
582I mean I'm taking the entire folder
583it's data
584even if only some of it goes into the database
585and I want it to go with the rest of the group data
586the folder is generated by the script
587with both jsons in it
588so I'm referring to the whole folder
589
590ArcadianMaggieYesterday at 5:21 PM
591OK, so it's both.
592
593DoranwenYesterday at 5:21 PM
594yes
595and I have to physically move those folders
596and to even be able to work with a gui, I'm going to be moving them out in alphabetical blocks
597using commands I don't know yet, lol
598(but I'll get good at them)
599
600ArcadianMaggieYesterday at 5:22 PM
601we'll probably just want to extract text description and membership number for the database. I'm not up for looking at it too thoroughly right now, but we might want a few more fields too
602
603DoranwenYesterday at 5:22 PM
604nods
605
606ArcadianMaggieYesterday at 5:23 PM
607aw, website: http://www.geocities.com/lydiabura
608
609DoranwenYesterday at 5:31 PM
610?
611
612ArcadianMaggieYesterday at 5:41 PM
613that was in the about example you gave me, the group's web site. lol
614
615DoranwenYesterday at 5:43 PM
616ah, lol
617yes
618so much geocities :(
619
620ArcadianMaggieYesterday at 5:44 PM
621indeed
622
623DoranwenYesterday at 5:44 PM
624description, membership, website, and summary are the ones I can think of for now
625
626ArcadianMaggieYesterday at 5:45 PM
627date created, we pulled that for the first batch of stuff with Garfield's script
628
629DoranwenYesterday at 6:29 PM
630that's in all the spreadsheets already
631except it's weird numbers
632but the jsons have the same weird numbers
633
634Yesterday at 6:29 PM
635Oh yeah. We'll have to figure out how to convert it
636
637DoranwenYesterday at 6:29 PM
638yeah
639but it's in the spreadsheets at least
640so that won't need pulling from the jsons
641
642ArcadianMaggieYesterday at 6:29 PM
643I'll see if Morgan can ask Garfield since his obviously converted the info into an actual date
644
645DoranwenYesterday at 6:30 PM
646yeah
647there's got to be some formula
648though... maybe his scraped the actual page
649I remember seeing some of the month names in other languages
650we ought to get a few samples of the numbers vs. the actual dates on their pages, if that's still visible
651(if it isn't, I think we at least have a few WBM links we could reference for comparison)
652and see if anyone on the AT has any ideas
653
654ArcadianMaggieYesterday at 6:34 PM
655well there seem to be a million responses in a google search but today is not the day for me to look them over. lol
656
657DoranwenYesterday at 6:36 PM
658lol no
659that is definitely not urgent
660
661ArcadianMaggieYesterday at 6:37 PM
662but it is an obviously common question, looking at how many search results pop up\
663
664DoranwenYesterday at 6:37 PM
665interesting
666
667
668
669DoranwenYesterday at 10:26 PM
670I think I figured out our respective perspectives
671I'm assuming that I will be sorting starting with the descriptions
672and you're assuming sorting starting with the GMDs
673(or FPL, maybe)
674the latter allows for fandom grouping or something like that
675the former, only alphabetical makes sense
676
677ArcadianMaggieYesterday at 10:27 PM
678well, I was thinking we'd be starting with the category and using the description to refine
679but assigning the different ID GMDs to different people to help
680
681DoranwenYesterday at 10:28 PM
682ah
683I still don't like the idea of multifandom groups being lumped into a "multifandom" category
684
685ArcadianMaggieYesterday at 10:28 PM
686it doesn't have to be
687
688DoranwenYesterday at 10:28 PM
689you'd copy the data three ways to start?
690like for that Pretender/Stargate/otherfandomIdon'tremember group
691
692ArcadianMaggieYesterday at 10:29 PM
693I don't follow what you're aksing there
694
695DoranwenYesterday at 10:29 PM
696like, I get the GMD for that - I'd have to copy one set into the Pretender folder, another into Stargate folder, and the third into the folder for the third fandom
697and then I'd have to do the same for the FPL
698and the description
699MUCH more work for me
700for those multifandom ones
701
702ArcadianMaggieYesterday at 10:30 PM
703as I keep saying, I'd prefer to do the spreadsheet stuff first before doing any physical moving of anything
704
705DoranwenYesterday at 10:30 PM
706because they're already split into the same folder
707but the physical stuff is all in multiple places per source
708so it has to be moved in pieces unless it's grouped together first
709like, I'm not going to process the GMD at the same time as the FPL for that same ID
710even if I did them one after another, I'd still be dipping in and out of multiple folders
711if that makes sense?
712GMD might be the vast majority of what we have but it's not all of it
713and we still have GMD + description to match up
714
715ArcadianMaggieYesterday at 10:32 PM
716I am really not following. I'd rather make decisions about the endgoal of storage and actual uses of the data before we decide how it's going to be organized
717
718DoranwenYesterday at 10:32 PM
719so if we have multifandom groups, I'd be doing multiple copies of each folder
720ah
721I just know that eventually I have to get the GMD and the FPL and the description - and in some cases also the PGO - matched up to a single group in one folder
722and that matching up should happen before it ever gets copied more than one place
723
724ArcadianMaggieYesterday at 10:33 PM
725we need to get everything first.
726
727DoranwenYesterday at 10:33 PM
728yes, I'm just thinking through
729
730ArcadianMaggieYesterday at 10:35 PM
731So if it's basically just being uploaded onto OTW for dark storage for 3 years, I don't see much use in a lot of work sorting everything other than listing what is in what zip. But for database purposes and whatever we decide to do with the collection outside of the OTW, we need to discuss what the end goals are
732
733DoranwenYesterday at 10:35 PM
734nods
735I know I want to be able to report stats
736number of LOTR groups, number of SV groups, etc.
737and obviously read stuff from my fandoms, lol
738
739ArcadianMaggieYesterday at 10:36 PM
740So if we have it stored somewhere searchable or on the IA, we might want a database that links to the collection by fandom, group, whatever. And we need to discuss with Morgan what Fanlore needs
741So those are the sorts of decisions I think we need to make first
742
743DoranwenYesterday at 10:37 PM
744nods
745
746ArcadianMaggieYesterday at 10:37 PM
747And we need to sort out what CAN go to the IA and what can't.
748
749DoranwenYesterday at 10:38 PM
750I would like to not wait too terribly long to discuss some of that because I really do want to get matching some of this up - the Sims people are eager to see what I have (though they understand it will be a bit with the amount of data to process!)
751So that is the work that needs to be done first
752yes
753
754ArcadianMaggieYesterday at 10:38 PM
755all right but seriously, I need a break. I've been doing this every night for 4 months now. lol
756
757DoranwenYesterday at 10:38 PM
758lol yes
759sorry!
760I've taken breaks
761
762ArcadianMaggieYesterday at 10:39 PM
763I really haven't
764
765DoranwenYesterday at 10:39 PM
766yeah :/
767this is my OCD "I'm tired of watching my hdds fill up with clutter that I can't sort" bit coming out
768it's like a compulsion at this point, I need to start sorting something
769it will make my soul feel better, lol
770I'm holding off for lack of hdds atm and because we do need to talk
771
772ArcadianMaggieYesterday at 10:39 PM
773I get it. My brain works differently. Do it once, do it right. lol
774So I see too many decisions that need to be made first
775
776DoranwenYesterday at 10:40 PM
777lol well, I'm looking at it this way - no matter how we do it, I have to match this up
778because if we're just shipping off to the OTW, we don't need the sorting drive for that
779they can just get the zips as-is
780and the physical sorting can't really be done easily by fandom
781
782ArcadianMaggieYesterday at 10:41 PM
783Exactly. But we need to discuss what goes where.
784
785DoranwenYesterday at 10:41 PM
786and I have to sort the descriptions alphabetically anyhow
787so I could start just unzipping GMDs and putting the groups in folders with the descriptions
788alphabetically
789even if I keep the IDs together
790to start
791I have to have somewhere I can go to and go "that's where all the data for groupname is"
792and have it be easily findable
793
794ArcadianMaggieYesterday at 10:42 PM
795I'd like to extract the actual desciption into txt to put into the database
796
797DoranwenYesterday at 10:42 PM
798and not try to think "now what fandom is that" or whatever
799I imagine I can probably get a script written that would do that for all of them
800the folder structure is consistent
801
802ArcadianMaggieYesterday at 10:42 PM
803That's what I figured
804
805DoranwenYesterday at 10:42 PM
806I'll work on talking to AT people about that
807because I'd be the one running it and pulling results to hand you
808
809ArcadianMaggieYesterday at 10:43 PM
810membership too
811
812DoranwenYesterday at 10:43 PM
813and see if I can't get that into the spreadsheets, even
814
815ArcadianMaggieYesterday at 10:43 PM
816Right. That's the data we'll need to refine fandom
817because not everything will be online anymore
818
819DoranwenYesterday at 10:43 PM
820though I might need to remove a few columns we don't need - the URL prefix
821I think I'd want to keep the intlcode
822but the URL prefix is part of the URL
823it was only in the spreadsheets so I could generate the URL automatically
824
825ArcadianMaggieYesterday at 10:44 PM
826right
827
828DoranwenYesterday at 10:44 PM
829that'd save some load time
830those spreadsheets are nearly all multiple mbs apiece as it is
831and if I include the data for all the new AT groups... eep
832
833ArcadianMaggieYesterday at 10:45 PM
834Yeah, my master is 25MB I think
835And the 2nd one is probably 16.
836
837DoranwenYesterday at 10:45 PM
838oh, they're 93 mb altogether right now
839but that includes all the dead groups
840they're just name + 404
841
842ArcadianMaggieYesterday at 10:46 PM
843But I'd rather work on organizing that stuff first, after we discuss end goal
844
845DoranwenYesterday at 10:46 PM
846I'm at a holding pattern on those till I get all the group descriptions
847
848ArcadianMaggieYesterday at 10:46 PM
849The AT just looks like they're throwing up blocs of groups without any sort of structure or organization at all
850
851DoranwenYesterday at 10:47 PM
852probably - it's there, but not easy to find specific stuff
853
854ArcadianMaggieYesterday at 10:47 PM
855at least from what they linked earlier, but it looked to me some of that was from 2018
856For fandom purposes, I'd assume we want something more organized and representative of the different fandoms
857
858DoranwenYesterday at 10:47 PM
859nods
860
861ArcadianMaggieYesterday at 10:49 PM
862I really need to get to bed now but we should maybe copy some of these discussions over to Morgan and set up a time to talk about it. I've been super busy at work this week so haven't had much time to do anything with YG
863
864DoranwenYesterday at 10:49 PM
865no worries
866my brain just feels better to be discussing a little
867sleep well!
868
869ArcadianMaggieYesterday at 10:49 PM
870But I'd also rather have time to think through database structure ahead of time so I have more concrete ideas to discuss and argue about!
871
872DoranwenYesterday at 10:49 PM
873lol
874
875ArcadianMaggieYesterday at 10:50 PM
876hehe