On this episode, we dive into Python for Lawyers and a special tool for conducting legal interviews. Imagine you have to collect details for 20,000 participants in a class action lawsuit. DocAssemble, a sweet Python web app, can do it for you with ease. Now, you may be thinking, I'm not a lawyer, so this isn't for me. Hang on for a sec.
DocAssemble is actually a general purpose tool. If you ever have done anything like run a survey on somewhere like SurveyMonkey or created a Google Forms to gather a bunch of information, you could do something way more advanced with DocAssemble and control the workflow with Python in a really creative and unique way. Join me as I talk with Jonathan Pyle, creator and maintainer of DocAssemble. This is Talk Python To Me, episode 229, recorded August 27, 2019.
Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via at Talk Python. This episode is brought to you by Linode and Datadog. Be sure to check out their offers during their segments. It really helps support the show.
Hey folks, before we get to the interview, I have some exciting news. We've teamed up with Humble Bundle to launch a great bundle of Python educational goodness. For a couple weeks, you can get three of our courses along with great content from RealPython, PyBytes and many others for as little as just $1. If you've been on the fence about trying one of our courses, here's a chance to get three of them along with a bunch of other great stuff. Just visit talkpython.fm/HB2019.
That's HB2019. And be sure to check it out before time runs out. Now let's get to that interview. Jonathan, welcome to Talk Python To Me. Thanks for having me. Yeah, it's great to have you here. We're going to cover a topic that we have, I believe, yet to cover really at all. And that is Python for lawyers or writing code that helps practicing lawyers do their work. That should be a lot of fun to talk about.
Yeah, the intersection of law and coding is not very big, but there actually are a fair number of people doing it. Yeah, there are definitely some people doing it. And it's like I was saying before we hit record, that a good friend of mine is a lawyer who does a lot of Python. And so I know that there's some areas where there's really interesting stuff happening, but it's maybe not as easy to apply
programming to the problems as say, like chemistry or something like that, right? But it's, I think there's still a lot of really interesting things going on there. And I'm looking forward to diving into them with you. But before we get to those, let's start with your story. How'd you get into programming in Python? I got into programming longer than I can even remember. My dad had an IBM PC with a monochrome
monitor green screen and I was just magnetically attracted to the thing. It had like basic in ROM BIOS. And so I would just go over and type programs into it, which I found by reading like compute magazine and byte magazine, which in the eighties you could buy in the supermarket and they had source code listings and you could type them in.
Do you remember when you would go to like bookstores and stuff and there was a whole computer section of these magazines and these books and like computer shopper for building computers out of parts and yeah, it was a different time, right? Yeah. I think the books were just kind of more interesting back then. And a lot of real do-it-yourself stuff. Yeah, for sure. Okay. So you started typing these in like a fair number of folks have to kind of
get started programming. Like, okay, well, it says if I type this, like I get a little game or a little something on my program, on my computer, because there was no internet, right? Like this human transcription was how programs got transferred. And it was great in basic because you could do interactive stuff very easily. There's this command called input, which got you a line of input. And that's kind of hard to do in a lot
of languages these days because the interfaces are more complicated. So I started with basic, but then as I got older, I taught myself C and assembly language and tried to do some real low-level stuff on the, on the IBM PC. But then I went off to college, majored in physics, and then went to law school and became a lawyer. But I never really got away from computer programming because wherever I went, I just found applications for it because people were doing stupid things. And I thought,
hey, just write some code that gets around this problem. So I was like using OCR text and converting it into spreadsheets using regular expressions in Perl and saved a client like $500,000 by doing that. It's kind of weird. So, so I just sort of randomly found myself coding, you know, a good percentage
of my time as a lawyer just because I could do it. And so I've been using Linux and Perl and like just random scripts for about 20 years, but I didn't start learning Python until I decided to start with Doc Assemble, which was about four years ago. Yeah. Doc Assemble is a really cool project, but before we get into any of that, you know, you talked
about converting OCR text into spreadsheets using regular expressions. You know, that's not super easy to do, but it feels like you can kind of piece together some of these libraries and these tools to make it happen. And it's one of these examples of, you know, this rant that I'm always on is that we don't necessarily need 10 times more programmers, but all the people out there with their specialty, they could have a little bit of programming skill. They could solve like major
problems in their area like law or biology or whatever. If you just say, you know what, this problem actually is really easily solved or pretty easily solved with a little programming if you just knew what to do, right? It sounds like that's a great example of it.
Yeah. And I think just because of my background, I was the only person in the entire firm who knew what a regular expression was and even knew that there was a solution to this problem because everybody else would just be like, oh, we need to hire some low cost labor in Vietnam to just type everything in manually. And I was like, seriously, there's got to be a cheaper way. Well, and more reliable and faster, right? Like, so suppose like you're like, you know,
we actually have this firm in Vietnam and these folks are really good. We can get this done. If it's that much text that you're talking about, does it really make sense to wait the month for them to type it in? And then you're always a little worried maybe about like a mistake slipping in, right? Yeah. But I think the way that the legal field has adapted is that they kind of are not so worried about mistakes. They kind of understand that they are to be expected. But yeah, there are a lot
of people who want to hold you accountable for every single OCR being absolutely 100% correct. And that results in a lot of waste of money. Yeah, I can imagine. All right. So that's the background and whatnot. How about day to day? What do you do? Are you still working as a lawyer now? I'm still working as a lawyer. I don't work at law firms anymore. I've been in the nonprofit legal services world for about 10 years. So I work at this nonprofit called Philadelphia Legal Assistance,
which provides free legal services to low income people in civil matters. So nothing criminal, but things like child custody, how to get welfare benefits, how to get out of mortgage foreclosure. And I have a job there in management and administration. I'm responsible for compliance with government regulations. But because of my computer background, I actually have sort of changed the way that we do business here. And so I automate a lot of the stuff that would otherwise be done manually.
And there are a lot of grants available for tech work in the legal aid space to help low income people. And so I've worked on a number of grants trying to bring computer programming and data analysis into the legal aid world so that we can better advocate for low income people as a whole. That sounds like a really good use of programming and legal skills. Now, I don't know very much about nonprofit legal services. I certainly understand how for-profit legal agencies work, right?
But just give me a real quick sense of, you know, like how do businesses like that even run for folks who are not in the legal field and know about it? The biggest thing is government funding. And in the United States, the Legal Services Corporation is a federal government agency that hands out money to every county in the country. So the low income people have some law firm that they can go to and get help for free.
And they've been a big supporter of technology. In fact, they, I think the legal aid nonprofit world is more technologically forward thinking than the big law firms that have all the money in the world, because we understand that it has benefits to helping low income people get answers to their legal questions. Yeah. So we've been pretty innovative in the, in the legal aid nonprofit world.
That's cool. Of course, you know, thinking about it, right? The, the large legal firms, their incentives are not necessarily aligned with automating and massively speeding up some of these actions, right? Like if I can push a button and get an answer in a couple of milliseconds, they can't bill for that. Right. I mean, I think they're very slowly. There are some price pressures on those, on the big firms.
And clients say, I'm not willing to pay for legal research because I know that this can be done with computers and AI. And so I think they're eventually going to come along, but yeah, they don't have any big incentive to save the clients much money. Once they get the job, right? Like I can definitely see that in the competition. Like, Hey, we could do this for half price of those guys because we're not going to charge you for this
thing. We automated, but yeah, it's, I guess it's a mixed bag. Interesting. So let's talk about this project that you created called doc assemble, which it's a really nice project. What is it? And what is it used for? Let's start there. So it's a free and open source platform for developing guided interviews. And if you don't know what a guided interview is, think about turbo tax. It's something that asks you one question at a
time. It can get very detailed. It can be very long, but at the end of it, you get say a tax return, or in other contexts, you might get a legal document that you can file in court, or you could even get just like legal advice at the end, like legal information or be directed to a resource or do an application. There are a variety of guided interview software packages out there,
but none of them were free and open source or did what I want. So, so I created doc assemble because it would serve my purposes and it'd be something that we could develop and crowdsource through the free open source software movement. Yeah. It's, it's really nice. And it's a beautiful looking site. I don't know too much about the legal websites, but I know like some of these gather your signature
sites and stuff, you go to them and they just look so bad and so old school and sketchy. You're like, I really don't know if I want to put personal information into this thing. So it's nice that this is a good looking web app that, you know, inspires confidence, I guess. Yeah. I don't quite understand why people pay $30 a month for some service that gathers the signature. Cause I figured out how to do that with a canvas element in JavaScript. It was really easy.
I don't really, I don't really know what the big deal is. Well, see that's part of that programmer skill, right? Like, yeah. Yeah. So if people want to experience this and get a sense for what it's about, I guess the first thing is you can drop over at docassemble.org, right? And then there's a, at the very bottom, you can run a demo and then it just goes through and ask you questions. Like, what is your name?
And then you hit it in you, what is your location? And then possibly based on what state you're in, it might ask you different questions, right? It can kind of flow different questions together
based on your answers, right? Yeah. And while you can do some of that stuff in like SurveyMonkey or Google Forms, this is kind of like the very advanced version of that, where you might have incredibly complicated logic that would be so very difficult to manage if you were trying to like hand code endpoints in a Flask app, for example, you'd just have so many endpoints and be
hard to keep track of them. But it's sort of a system for abstracting away that the logic and making it easy and maintainable to go from what the law is or what the domain knowledge is to an interview that gathers information. Yeah. And I guess it's worth pointing out and you do on your website or the GitHub repo, I can't remember about, I read somewhere from your documentation that this is developed in the context of a practicing lawyer, but it is not specific to law, right?
Yeah. I think one of my first users who found it on GitHub was in France and they were using it to diagnose problems and mechanical equipment. So, you know, anything that's amenable to asking one question at a time where you don't want to have to like hand code all of those screens, you could use this system for. Yeah. I can see this almost like tech support even, right? Like in that context, does the machine turn on? Yes or no? Yeah. Does it, does smoke come from it? Yes or no?
First turn it off. Okay. Now what's the next question? Yeah. That's, that's pretty interesting actually. And it definitely seems more flexible than SurveyMonkey and you know, all those things are commercial services that are SaaS and you can just take it the way it is or leave, right? And this is obviously written in Python, something you can download from GitHub and customize. And people do like to customize. So a lot of people don't like my standard bootstrap front end. And so
they, they write their own CSS. So there's really no limit to your ability to customize if you know how to write JavaScript or know how to write Python. That's one of the nice things about open source software is that like, I have no problem with ultimate extensibility. Yeah. That's super cool. And it's, you know, a Python web app based on bootstrap. So that probably means that all the fancy, nice bootstrap themes that you can find over at like wrap bootstrap or start bootstrap, or,
you know, probably a bunch of others that you don't know about. You can go find all these either super cheap, like 10, $20 themes, or even free and open source ones. You can probably plug those in and get the look and feel you want, right? Yeah. And that's kind of why I picked bootstrap because it's widely used and it's themable. And there are a lot of different options if you don't like the standard look and feel. Cool. So I definitely want to dig into the technical side
of things, but maybe just another quick question or two of the high level to set the stages. So you're working, helping out these folks at the legal nonprofit. How do you use this in your day-to-day job? Well, I'm so busy maintaining the platform and working my day job that I don't really have that much time to deploy stuff, but I have been working on a very complicated interview that asks all the questions necessary to help somebody file for bankruptcy. And that's primarily being done by
a nonprofit called Upsolve, but they're one of our sub grantees. So if you check out Upsolve.org and see what they've done, they've kind of democratized chapter seven bankruptcy for the nation. Whereas before you would have to hire a lawyer for $2,000 or try to find a pro bono lawyer to do it for free, which is very difficult. Which sounds horrible when you're literally filing for bankruptcy. You're in a financially bad place and
then you got to go pay to dig out from the hole, right? Which is rough. Yeah. It costs a lot of money to be poor. You can go on Upsolve site and you can go through a very long guided interview that's using Doc Assemble and it gathers all the information necessary for an 80 page bankruptcy petition. And then they have a lot of other custom code after that, but the questionnaire that they do is based on Doc Assemble. So I do get to work on some of that during the day,
but I also use the system in legal aid to do things like gather retainer agreements from clients. I can send them a link. They can click on it with their smartphone, sign their name with their finger and the signature goes into the document. So stuff like that is pretty useful. That's really cool. And I'm, it seems like it really should be used a little bit more even outside of the legal space. Cause it seems, seems quite interesting. So maybe one of the things that
we could talk about is just why Python is not like you were afraid of other languages, right? You obviously did a bunch of C in assembly language, right? Like that's a pretty hardcore language. So why do you choose Python for it?
Well, when I started the system, I was sort of a Perl hacker and I loved Perl, but my idea for the system was I want to make this into this high level language so that you can basically code the law and that a lawyer could sit down with minimal knowledge of computer programming, like just to do if else statements that set true false variables, for example. I mean, that is not rocket science. And I wanted them
to be able to understand code and read it and work with it. Maybe they would get help from somebody to clean up the syntax. But, so I was looking around for something that was very clean and readable. Perl is great, but it has so many punctuation marks, whereas Python is like so neat and clean. And also, I saw these books in the bookstore where it was like teach Python to your kid and like integrate Python with Minecraft if you're, you know, 12 years old.
Yeah. So I thought, well, if this language is good enough for six year olds, then, you know, it's good enough for a lawyer who has an advanced degree. so I thought that would be a good general purpose programming language to base this on. And I definitely wanted a general purpose programming language. A lot of people argue with me and they say, oh, if you're encoding legal rules, you really should have a declarative programming language. But the problem is that all those
declarative programming languages were developed in academia. They don't really get used much in the real world and they don't have loads and loads of packages that you can just install if you want to integrate it with Slack, for example. Give us some examples of the declarative languages you're considering or people were suggesting.
Oh, I don't even remember them because I didn't give them too much of a thought, but everyone that I found it, like, first of all, it was not that easy to read because it would use like weird Greek notation or whatever. Yeah. I also read a article where they had tried to do some of the stuff in the eighties using one of these declarative languages. And they found that the attorneys just like first wrote out procedural
code and then converted their procedural ideas into this declarative syntax. So I thought, well, maybe this whole declarative stuff is really just something that's academically interesting, but the way that people think is more aligned with the way the general purpose programming languages that are procedural actually work. Yeah. Yeah. And Python has a really... But there's a big debate. Yeah. I can imagine. So Python has that joke, I guess. I don't know if you've seen it. It says it has
like a little paper or file or something with some, a pseudocode. It says, how do you convert the pseudocode into Python? Like they put dot PY on the end of the file. Yeah. Python is one of the languages that's closer to the pseudocode in the way that people might like sketch it out in words, not flow charts, but like little words, right? Or statements. So it's pretty nice as opposed to, I don't know, Java or C# where you've like, well,
now we go create a class and you put the public static main void in here to get started. Like, whoa, what is all this? Yeah. And I remember when I was a teenager, I was reading a lot of Donald Knuth who did LaTeX and stuff. And he had a whole book on literate programming. And I was really inspired by that. Like, can we make programming as much like English as possible? And I think Python really does that to a big extent. Yeah, for sure.
Some people want me to use JavaScript and JavaScript is very nice, but like iterating through an object or a dictionary, it takes like all this nonsense in JavaScript. Definitely takes nonsense. Whereas in Python, it's so simple. I think JavaScript is nice in that it is super executable in lots of places, right? Like we all have browsers, you can run it, but that doesn't necessarily mean it's a nice language. If you take away its execution story, right? Yeah. It's not horrible, but...
No, no, no. There's certainly things that are worse. But Python just has so many advantages. Yeah. Especially when your goal is to write, create a way for folks who are not programmers to write simple logic into it, right? This portion of Talk Python To Me is brought to you by Linode. Are you looking for hosting that's fast, simple, and incredibly affordable? Well, look past that bookstore and check out Linode at talkpython.fm/Linode. That's L-I-N-O-D-E. Plans start at just $5 a month for
a dedicated server with a gig of RAM. They have 10 data centers across the globe. So no matter where you are or where your users are, there's a data center for you. Whether you want to run a Python web app, host a private Git server, or just a file server, you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network, 24-7 friendly support, even on holidays, and a seven-day money-back guarantee. Need a little help with your infrastructure?
They even offer professional services to help you with architecture, migrations, and more. Do you want a dedicated server for free for the next four months? Just visit talkpython.fm/Linode. One of the things that I thought was interesting was your use of YAML and Markdown. Now, Markdown, I kind of expected, like that's not super surprising. But if you want to create one of these interviews, you might want to ask the question like, what is your name and what is your age? Oh, and the age has
to be an integer and things like that. It has to be a number. So maybe talk a little bit about how you're using YAML to let people create these interview flows. Well, I picked YAML in part for the same reasons I picked Python, because it was machine-readable and human-readable at the same time. Like rather than use JSON as a way to structure things like lists and dictionaries, I thought it
made sense to use YAML just because it had the minimum of punctuation. And also attorneys are used to doing outlines because when they go to law school, that's how you study is by creating an outline of the subject. And I just thought YAML looks so friendly because it's just like bullet points made of hyphens. So that's why I settled upon that. I needed something that wasn't just code. It was more
of a data structure. Yeah, it looks really nice and clean. And I can certainly see if you give like a little template example to somebody, they're like, oh yeah. Yeah, I try to teach by example. Yeah, that's definitely a good way to do it. Yeah. So you can have like a dropdown with the list and really, because that's pretty easy, right? And it's actually not that different than markdown, like the dash, dash, dash item. And YAML is also would work in markdown as well. That's pretty
interesting that they're kind of similar. Yeah. And I chose markdown just because I didn't want a lot of HTML characters. And also it can convert into so many different forms. So markdown is used sometimes for documents that get turned into PDF, but it's also used for stuff that appears on the screen. So it's a very flexible way to format stuff. Yeah. And the thing that I like about markdown, like the reason I use it a lot is you can use other formats that are maybe richer, like
HTML fragments and stuff. But if the slightest thing goes wrong with it, everything is wrecked, right? Yeah. It's so bad. And you've also got the potential problem of user input that could be malicious, right? If you're accepting this like definition from someone else, right? But if it's markdown, it's pretty safe. Yeah. Now, another thing that you do that I think is pretty interesting is you can define these questions like, what is your favorite number? And it defines a variable like
best underscore number potentially. And it has a data type and so on. But then you can write Python code that has conditionals, right? We talked about, so if I said I was in Oregon versus a different state, it might ask me a different question because the rules in Oregon are different than they are in Pennsylvania, for example, right? Right. And so you've got this little example here that says something like, if user.iscitizen or user.islegalpermanentresident, user.islegible equals
true.else, user.islegible equals false. Now, that alone will actually sort of trigger some of the questions that get shown or a flow in which the questions are asked. And so that sounds like magic. How is this simple Python conditional and little tests like that that I'm writing actually controlling the flow and the questions? Yeah. So the core logic engine of Doc Assemble is that it tries to evaluate
some Python code and then it traps any name errors. So if you go along and you write some Python code and you refer to a variable that hasn't been defined yet in the namespace of the interview answers, then it triggers a name error, which is just like core Python stuff. Right. If I say user.iscitizen and that the user doesn't have a .iscitizen, obviously, like you might get an error or something like that, right?
Yeah. That would be an attribute error. But if you refer to just a name that's not defined, like I trap that. If user itself is not defined, for example. Yeah. If user is not defined. Then I wrote code that then takes that variable name and then goes and looks for a question in that YAML file that offers to define that variable. So it's sort of like, and then it goes in, once it sets that variable, which might take a question to the user, then it evaluates it again from the start.
That's funny. So it like runs it and it goes, oh, we stopped, we crashed on users. Do we, we bet we have to ask the user question. Then you have the data value set for the user and you ask it, you like rerun the Python code again, you get a little farther, you're like, oh, yeah, is eligible as a attribute error. We got to figure out if we got an, is eligible or whatever, right?
Yeah. So it's like every time the screen loads, it reevaluates everything from the top, you know, which is moderately inefficient, but with computers, you know, it's very fast. It's yeah. Yeah. Yeah. And so at the end of it, like it, once, if it gets all the way through, then you're done with the interview. You have all the information you need. You've,
you've gone through the logical paths that you need to go through. And the nice thing about Python is that if a user is citizen or user is an eligible alien, you know, it'll stop at a user is citizen and it won't even try to evaluate the second part. So it won't trigger any name errors or any other errors on the part after the or. So therefore the interview can be parsimonious about what it asks, what questions it asks of the user. So the user is only asked for information that is logically
necessary. And that's all done sort of by tapping into the way that Python is parsed and evaluated. Yeah. Interesting. Because Python short circuits the or, you might not have to ask them both, are you a citizen and are you a legal resident? And it's two separate questions. Like only if they say, no, I'm not a citizen, do you ask them about the residency? Yeah, exactly. Huh. All right. So this is pretty interesting. Like this is not how normal Python works,
but this is a pretty creative. And I would say in this context, really a positive way to build like an extensible way to script out this flow. Like that's definitely better than, you know, flowchart, draggy droppy backend or something. I still have a lot of people who would prefer that I created a flowchart GUI interface for them. But I just find with any service that offers a no code solution, what that means is the easy stuff is easy
and the moderately difficult stuff is nearly impossible. Whereas with code, the easy stuff is kind of hard and the moderately difficult stuff is a little bit harder and really, really complex stuff is doable. And so I'd rather have the latter. Yeah, of course. And that makes a lot of sense because if you just want to ask like three questions and whatever, it's like SurveyMonkey or something like that, right? Or a Google form or, you know, you name it, right?
I like the idea of attorneys just being able to concentrate on specifying the law in if-then-else statements and not having to worry about interview flow. Like they don't need to really think about like what question is asked when. They can just concentrate on what they're good at, which is envisioning the law and just writing it out and let the computer do the work of figuring out what questions to ask in the order.
Yeah, it's really creative. You just rerun it over and over until it stops crashing. Yeah. You just like work your way through. I think that's pretty creative and I've not seen anything like that before. And I was giving the example of a name error where you refer to a name that doesn't exist, but then in order to do attributes and indexes, indices, I had to create a new object that would raise special exceptions because Python variables, and this is one of the limitations of Python,
like Python objects are not self-aware. Like they don't know their own name. They're just kind of like a value that has pointers to them somewhere in the core. So I had to sort of give each object an inherent identity. And so that confuses things, but it all works. Yeah. Yeah, I know. It sounds like it works pretty well. So you define these interviews in YAML files and then you define the flow by like stating the law or
your desired sort of flow of the interview in Python. And then it just doc assemble just pieces it all together and makes it real. Yeah. And at the end, you just write some logic that where the endpoint is presenting some final screen to the user. And doc assemble just kind of uses dependency satisfaction to ask all the appropriate questions in order to get there. Yeah. And of course this wouldn't work with some kind of compiled language, right? Like a compiled
language would have to try to compile and then run it. And it would have to have all the elements available at compilation time, not just the ones that it's trying to crash into as it makes its way on the branches. Well, there are people smarter than me who've built things like the, I don't know how to pronounce it, the Jinja 2 templating system. I think what they do is they actually like parse out all of the
variables that are used and then creates these like stand-in objects for them. I think there are some ways to sort of compile everything and then do the logic on it later. But I've just used the sort of exception trapping system. Yeah. Yeah. It's interesting. So on the website, which I said, it really presents things pretty
nicely for this open source project. You have a bunch of features and I think it might be worth going through those features and then like digging into the technology behind it as a way to see more of the various libraries and technologies that are at work here. Sure. Sound good? Yeah. So let's just like start at the top one. It says you have a whizzy wig, what you see, what you get at her and you can compose your templates as a Word document using a Word add-in
to get started. So how does that work? Like what do you, we talked about YAML being the definition of these things, but then what's happening here? Yeah. A lot of lawyers like to create Microsoft Word documents and very helpfully, there is a Python package called Python doc X template that somebody developed. It uses two other packages. One is doc X,
which is kind of a utility for writing Microsoft Word files. And the other is Jinja 2, which I was just talking about and just kind of mash them together by using Jinja 2 on XML because Microsoft Word files are actually XML inside of a zip file. And so it created a system where you could do Jinja 2
on a Microsoft Word file. And so I also figured out that Microsoft had a pretty neat tool for putting an add-in into Microsoft Word, both the online version and the desktop version that ran in a little sidebar. So I was able to create a sidebar for Microsoft Word that had the variables in your interview and you could just click them to insert them into your Word document. So that's how I was able to do some nice
Microsoft Word templating and to make it as WYSIWYG as possible. I'm actually not a fan of WYSIWYG. I put that on the website because other people are. I would prefer that everybody used Markdown. I have another document assembly tool that doesn't use Microsoft Word files that just converts Markdown to PDF. And that's what I like better because it uses Pandoc on the backend, which in turn uses LaTeX.
I'm a big LaTeX fan. So yeah. Okay. Yeah. Sounds, sounds very interesting. We talked already about gathering signatures. That's one of the things that it does. And it sounds like you're gathering those with an HTML5 canvas in JavaScript. And then what you convert those to images or something like that? Yeah. The canvas, I think it's transmitted as a PNG file encoded in a URL. It's like a base64 conversion and it turns, it's like a data URL. And so you just transmit that in a post request
up to the server. It's a, it's really pretty simple. Yeah. I've never tried that, but that sounds totally simple. You also have live chat. So if you're hosting an interview, like you're person receiving all the answers, right? You can assist users in real time with even screen sharing and remote screen control. What's up with this one?
That was a lot of work. I figured out how to use WebSockets technology. There's a great package called FlaskSocketIO that enables you to use this sort of WebSocket event-driven communications protocol within sort of the Flask paradigm. And so I created this very responsive, quick chat system where you can chat back and forth with somebody who's like works for your company, who helps out users. And so you can get chat messages from users in real time facilitated by like Redis and this
socket IO system. And one of the cool things I figured out I was able to do with that is to have this sort of pseudo screen share where I would store the HTML in Redis and then pull it down and let the operator look at it. And it would be refreshed like on triggers in JavaScript. And so, so you can just sit there as the operator and you can watch your users using the system and it's not transmitting pixels. It's just transmitting HTML.
I see. So you're like, this is literally the DOM that they are looking at, right? Yeah, basically it's inside of a little iframe. Yeah. I also figured out it was fairly easy for then the, the operator to seize control of the user's browser just by sending over some events over the WebSockets. So as the operator, you can click a button and control the other user's interview and type stuff into their text boxes or click on their buttons. They can see you doing this in real time.
Yeah. It sounds pretty advanced, but it was actually pretty simple to implement. No, it sounds really cool. And I haven't heard of too many things that are like at this level, you know, I've definitely seen those little chat programs and stuff. Yeah. With the operators and whatnot, have some experience with that, but that sounds like a really cool feature. I honestly didn't expect that to be in here. Yeah. I thought it was a cool feature too, but like nobody is using it. I have no idea why.
Sometimes you create something and you think you're, you're going to get massive use and then nobody actually cares. Well, yeah, that is always the challenges. And I suspect, you know, if I had to guess, right, like this is super cool that you can help people this way, but you know, it's also challenging to have a person sitting there that can always help. And I don't know, it's, it's not a lot of fun to be a chat operator, to be honest. I've been one temporarily.
But I do think we need to get out of the paradigm of like, either it's a hundred percent, a robotic service, or it's a hundred percent human service with the human touch. Like there has to be some middle ground where a human gets involved when necessary, but otherwise they're using a web app. So I think it's, it's good to have close contact with your users, at least for part of the time of your app development, because then you, you really see what their pain points are in the real world.
Yeah, that's for sure. And you know, some of the stuff that you're doing is, it's super important what the right answer is, right? It's not like, well, I clicked this thing in this like spreadsheet app that you built and it didn't quite do what I wanted, right? This is, you know, are you eligible for bankruptcy or something, right? Yeah. And you know, you don't want your users to commit perjury when they're
talking to a court about what their property is, for example. So yeah, it's important to get the stuff right. Yeah, exactly. And so I think the stakes are pretty high here. So that's pretty awesome. This is a feature. This portion of Talk Python To Me was brought to you by Datadog. Get insights into your Python applications and infrastructures with Datadog's fully integrated platform. Debug and optimize your
code by tracing requests across web servers, databases and services in your environment. Then correlate and pivot between distributed request traces, metrics and logs to troubleshoot issues without switching tools or contexts. Get started today with a 14 day trial and Datadog will send you a free t-shirt. Visit talkpython.fm/Datadog for more details. Kind of along those lines, you also have SMS and email, which I guess that's pretty much to be expected, but maybe tell us about that real quick.
Yeah. Isn't there some adage that every piece of software bloats until it can send and receive email? Pretty much. Pretty much. So yeah, so my software does, you know, sending email is not that difficult, although I found a way to do it with Mailgun, which uses HTTP because SMTP is just so slow. And then the text messaging was also super easy. You know, you just get a Twilio API key and it's sending messages is very
easy. And then I have another feature using email that nobody uses where you can actually run a mail server on your server and you can sort of email into your interview. So if you have an interview session, you can, if you want, you can mail documents to it. And so I programmed the mail server to intercept those messages and then sort of make changes in the appropriate interview session. Yeah, that's cool. I've done that before as well. Like you can either go to the site and log in and
answer this or type it in or whatever, or you just reply to this email, right? And then it just folds it back into the database as if you had done that. Yeah, that's a different level than just sending an email, but it is cool and it works. Yep. Now you talked earlier about the web sockets and the live share and Flask and all that. Let's talk a little bit about the hosting. We have Redis, we have Flask, we have some kind of database,
I suspect. This sounds kind of out of the realm of standard lawyer, technical Linux capabilities. Yeah. And unfortunately it's, it's like not a pure Python package. The way I distribute it is through Docker because it has so many, it has so many non-Python dependencies and services that need to be running. And it's just, you can script the orchestration of that with Docker instead of giving people complicated instructions to run and you can get it nicely containerized. So there isn't much
you can do with it as a pure Python package. Although I tried to abstract it away so that you could sort of use the core logic engine and Python by itself. But yeah, that's Docker has been extremely useful because I don't think I could have gotten anybody to use it unless installation was, you know, one line of bash command or something. Yeah. That's cool. You know, Docker is really interesting in this regard. Like a lot of times it solves these
problems, but it also, it has its kind of its own complexity. Like Docker never feels super simple to me. Like, okay, well I can start this, but then how do I make sure it's running or how do I like update a new version as a beginner? You know, there's those things always feel pretty challenging. Yeah. That's one of the big challenges of distributing software that gets updated is, as you have to take into account all the existing users. And so like with something like the SQL
backend, it was really helpful that I used SQLAlchemy. I don't know how to pronounce it. Yeah, that's right. And it also has this sort of add-on feature called Alembic, which gives you this method for upgrading your SQL. If you wanted to add a column, for example. Right. Yeah. SQLAlchemy is awesome in that it lets you write simple Python classes that map to your database. It'll even create the tables and the indexes and the relationships. But boy, if you change anything, it hates it. Right.
Well, I think with Alembic, it's kind of adjusts for that in a pretty elegant way. So I haven't had problems. Right. But if you don't have Alembic, right, your app crashes as soon as you make the changes. So you need to go and create the migrations and then automatically apply them like the next time you start the app and all that kind of stuff. Right.
Yeah. So I actually spent the whole last week taking a vacation and working on upgrading from Debian stretch to Debian buster and Python 3.5 to 3.6 and migrating the web server from Apache to Nginx or however you pronounce it. Yeah. And all that Docker stuff does take a lot of time because you have to get it just right because you don't want some user to be like, oh, my system crashed. What do I do?
Yeah. Well, what's nice, though, about Docker is you build the base images like you figure out how to create a Debian server with Nginx set up correctly. It's good. Right. You don't have to think about it. It's now it's good. It's all set up. Right. And so you just kind of build it later at a time. But yeah, it's the real challenge I see is migrating that over over time. So what do you tell folks if you say, well, we're going to give them a new version? What do you say that they
should do? There's two ways that you can upgrade. One is by just clicking a button. that does a Python upgrade that just runs pip and gives you Python packages and installs in a virtual environment and restarts the services that use Python. So that is pretty painless and doesn't have a lot of errors associated with it. But sometimes you need to upgrade all of the backend stuff. And the way that you would do that is by basically stopping and removing your Docker
container entirely and then running a new one with the new Docker image. But then the problem is, well, what about all your users data? And so you have to have systems for using Docker volumes or the recommended thing is using cloud services like S3 or Azure blob storage. And so I have these complicated systems where every time you shut down the server, it backs up all the information to the cloud. And then when you start it up again, it restores from the cloud. Oh, that's nice.
And so that automatic backup, which is probably something that's also challenging for folks. Oh, yeah. Yeah. And so I've got like cron job running on this Docker machine that does backups. So people have run into problems. They tend to do crazy things that you would never anticipate. But yeah, it does work pretty reliably to back up to the cloud that way. And the nice thing about
having everything sort of cloud-based is that it was very simple to make it scalable. Like it's not a big deal to add another web application to your cluster. And, you know, you can bombard your DocAssemble system with loads and loads of requests. And you're really only limited by the speed of your SQL server or your Redis server. Yeah. So do you have like one Docker container for Redis, one for SQL, and then like separate web front ends, you can fire up more of them or something like this?
Yeah, you can do it that way. Or you can use like a hosted solution from your cloud provider for your Redis or your SQL, which I actually recommend because they have nice backup systems built in. Yeah, that's good. Just point into that thing. And they already know how to make sure it's safe and fails over and whatnot.
Yeah. But the problem I have is that people who really have no experience with system administration are trying to teach themselves Amazon web services and like multi-server systems. They just get confused. The problem with making things easy for people is that you get all these curious people who don't have enough experience to sort out problems. So like they run into an error and I say, well, get yourself a command line. And they're like, how do I do that? Yeah.
SSH to your machine. How do I do that? I typed SSH. It doesn't work. Yeah. Then you go down a very long and windy path. I'm like, can you teach me how to do this? And I'm like, I learned how to do this over a period of 20 years. How am I going to teach you? Yeah. Well, I mean, that's part of the trick of like being a programmer or in technologies. People look at you and they're like, you know, all these things. I think that means you're
either super smart or you have some super amazing way to learn them. Like, no, you went through the same painful steps, but you just like layered on these skills one at a time. You're like, yeah, SSH used to be hard, but I figured out how to get the keys registered and like everything. That's not a problem. It's gone on to autopilot. Now I'm on to the next problem, like database migrations or whatever. Exactly. Yeah. Yeah. It's tough to communicate all that.
I don't even really know how to do that all at once. Yeah. And they expect it to work so perfectly. And I'm just like, don't you understand what benefit you're getting from Docker? It's amazing. We didn't have this 20 years ago. We used to do it the hard way. Yeah. So let's go, let's keep going down the thing. I think that's interesting here. So you have multiple language support. So like I could conduct the interview
so the person could say, I prefer Spanish or I prefer English. And then they may get their questions in different languages. Right. Yeah. I had a lot of features to help with translations and multiple languages. You can have like multiple YAML files, one for Spanish questions and one for
English questions and use them all in the same sort of logical interview. You can also write everything in English and then generate an Excel spreadsheet that has all the text in it and then send that spreadsheet to a translator and they translate the English into some other language. And then you load that spreadsheet into the system and it will substitute the English with the other language. Yeah. That's cool.
So people are using it for some, for like five language interviews. And I think using the Excel spreadsheet is a nice medium because that's what translators are comfortable with. Yeah, sure. So another one that's interesting is extensibility. Obviously this is where it gets to more of the developer side of things, right? You can use the power of Python to extend the capabilities. Maybe give us some examples so we know what's going on here. Cause you also have APIs and
integrating with third party apps. Maybe talk about those two at the same time. Cause they kind of seem the same, but not exactly. Yeah. I tout the extensibility just because everybody wants to do their own idea and I can't anticipate all the features that they're going to need. So I just give them the power of Python and then they can install a package to do whatever they want. So some people wanted to do
integrations with Google sheets. So I looked up and found that there is, you know, a package for that on PyPy and I can show them how to do that and how to set it up. And so, so people have integrated all sorts of things just by importing the package into their system. Yeah. Really nice. And then APIs for integrate with third party stuff. There are a number of things that I have like a GitHub integration. There's this authoring system where you can write your own YAML right in the web browser.
And I have a GitHub button that then runs get on the backend and does pushes and commits and stuff like that. Yeah. Cool. And for the login system, I'm using the built-in Flask username and password system, but a lot of people want to have social logins or Auth0. So I have some APIs that integrate with that. People also want to, they like writing the stuff in the web browser, but they also want it
saved to their desktop. And so I have an integration with a Google drive so that you can press a button and sync to your Google drive and then run your interview files that you just synced. So there are a lot of different ways that DocAssemble talks to other applications that people like to use. Yeah. Cool. Another one kind of related to that is you say you can package your interviews and use GitHub and PyPI to share your work with the DocAssemble user community.
So can you create like extension packages or something like that? And then people can add them as a dependency of the app? Yeah. So the way that I have structured the system is you write your YAML, but you can also write modules like .py files. And the way you package and distribute things is using just the plain old Python packaging system. And it will create the Python package for you. And there's a button to upload it
to PyPI as well as one for GitHub. I'm just sort of using the exact same software distribution system that already exists. But the YAML files and other files are just under a data folder in your Python package folder system. Yeah. Okay. So it's great. I didn't have to invent my own package distribution system. I'm just like, do what everybody else does on GitHub and PyPI. Yeah. Awesome. It also has support for background tasks. Like even when people are not interacting with
the website, it could be running stuff in the background. Is that using Redis or how's that happening? That mostly refers to Celery. So Celery is a distributed task queue system in Python, which is amazing because the problem with web applications is you have to do everything quickly or else the browser is going to time out. The user is going to be sick of looking at a spinner and stop the
connection or something. But if you use Celery, you can have long running code execute in a separate process and then save its results to the place where you have your interview answer stored. And the other great thing is they queue up in there. So I have some cool Celery stuff like where if you upload a PDF file, it makes a PNG image out of every single page and then sort of in parallel, OCRs them for you using Celery and all of its queuing magic.
Yeah. That's cool. Just grab a PNG of the page and throw it up there and say, I mean, you get a chance OCR this and store it here or something like that, right? It's really useful because sometimes people have really long documents that they need to assemble. And if it takes 30 seconds to do that, you want to be entertaining the user while that happens. And so you put it into a background task and maybe have the user answer some other questions.
And then you just check, oh, is the task ready? And then if it's ready, then you get the document. Yeah. So yeah, that's been a real lifesaver. Nice. I guess maybe the last one to touch on here is the secure bit with server-side encryption and document redaction and things like that. Maybe you're using Let's Encrypt or something else along those lines. I want to talk about those things.
Yeah. A big concern of lawyers is that they're getting client information. They're getting personally unidentifiable information and they want some reassurance that whatever software they're using is not going to reveal those personal details to the world. So there are a number of features that I use to try to increase security. One of which is I have Let's Encrypt built into the deployment system.
So all you need to do is give it your email address and set up your DNS properly and it'll do Let's Encrypt for you and it'll renew your Let's Encrypt certificates. It also does server-side encryption. And the way that interview answers are stored, I'm letting the interview answers just be a Python namespace, which is just a dictionary. And I pickle that using the pickle package. And then I encrypt it
with the user's password. Every time they contact the server, they send this sort of secret password and I use that to encrypt and decrypt. So that password is never stored on the server. And so I can have encryption in the, of the pickled, a serialized data structure or right there in the SQL database. And then other features too, like a multi-factor authentication for login. That was very easy to
implement with the various apps and SMS messaging. And a redaction, like I figured out a way to put, replace text with like blocks of black ink or whatever. So there are a lot of different ways that people can feel secure. I haven't figured out the whole encryption of files on the file system problem yet. So if you upload a file, that's not encrypted server side, but everything else is.
You know, the project is open source and you probably would accept a pull request that would add that feature or something like that, right? Oh, absolutely. Speaking of which, are people, are you looking for contributors to this project? Are other people already working on it with you? What's the story there?
Well, what I've found is that there's no magic to open source and crowdsourcing. Maybe there is in other areas, but there are a lot of people who are using the system, some of which are programmers themselves. But it's pretty rare that people contribute something substantive to the code. So I've still found that even though like I never took any CS classes and I just do this on the nights and weekends, I'm still doing 99.9% of the coding. It just hasn't magically happened that other people
contribute. So if there is somebody who wants to really dig into this and contribute, I would totally welcome that. Although I think because I've had 99% control, I kind of have a strong feeling of authorship. And so I might have strong opinions about what gets brought into the system. Yeah, sure. I mean, one of the things that people do sometimes that can be frustrating is they see something, they're like, oh, I should add this feature or fix this thing. And they'll create a
pull request and do all the work and then submit it. And the maintainer will say, well, that doesn't fit with my view of this or whatever. And you're like, but I did all this work. All right. So, you know, maybe people could open like an issue in GitHub, say, hey, I'm considering this feature. Here's what I'm thinking about doing to add it for you. Would you be interested or would you hate this idea of having it in this project?
I love that. Like if people talk about it, we also have a very active Slack channel where we discuss these things. Yeah, that might be also a decent place. But, you know, GitHub is nice because Slack is super transient, right? Like there might be five people that all have this idea that would say, yeah, that's great. Or actually I see it this way or right. But if it was in Slack, it's gone if you weren't there. Yeah. And so I use both Slack and GitHub for that sort of stuff.
Yeah. But I would be grateful if there were more Python programmers out there who wanted to try their hand at adding something to the system or just using it and contributing bug reports is also really helpful. Right. For sure. So one of the things that I noticed that jumped out to me when I was going through the demo example, right? I was going through and answering the various questions and there's a
button up at the top. I suspect this is not in the real one, but in the demo one it is where you can press and say, show me the source. And it'll actually pull up and like show you the YAML file and the various other pieces and like some of the performance analysis stuff, which is pretty cool. People can go check that out if they are looking for help. But another thing that I think is nice is you have this readability score, I guess.
Not readability, the app, but like how easy is this to read? Like the Flesh Reading Ease or the Flesh Kinkade Grade Level, things like that. That sounds pretty helpful if you're trying to have questions that people want to answer correctly. You want to keep that probably as simple as possible, right? Yeah. I think that's what a lot of people don't understand is that really the hard part about developing guided interviews is getting the language right and being able to be precise,
but also use plain English. And lawyers have a tendency to just go on for too long and use big words. But if you, so I added that tool to sort of tell you what the grade level of the language you're using on that question was in the hopes that people would go for a sixth grade reading level and keep working on their language until it got to that level. Yeah. That's cool.
That's thanks to this great package called TextStat. So it was one of the things that because of the magic of the community, I was able to integrate it very quickly. Yeah. That's cool. And that's what the main reason I wanted to prompt you to ask about it is like how you're generating that in Python. Yeah. It's that TextStat package.
Okay. That's really cool. I can see lots of uses. Maybe you don't show that to the user, but in your app, you're like, actually, you know, the CMS or whatever you're building, maybe it wants to have that kind of analysis in it. That's cool. Yeah. Awesome. Well,
Jonathan, I think that pretty much covers it. I think this is a nice project. And if you're out there listening and you need to conduct surveys or interviews that are slightly better and more controllable than like the standard SAS products, this seems like a pretty good option. Well, thanks for having me. Yeah. You bet. Now, before we get out of here, though, I've got the two questions at the end of the show that I always ask you. So let me ask those to you now.
Oh, good. If you work on Doc Assemble, you can write some Python code. What editor do you use? Emacs. I've always used Emacs. Yeah. And I don't think I'm ever going to do anything else. I basically live in Emacs. I also use org mode to manage my life and track my time. Okay. So yeah. So you run the Emacs operating system, basically. Yeah, somewhat. Yeah. Cool. And then you've talked a lot about cool and interesting and unique
Python packages here. But maybe if you want to give a shout out to any additional packages that you think are great for people to know about. Well, one that I really encourage people to use is software called Lettuce, which is a Python version of another package called Cucumber. So the idea of Lettuce is it's a testing platform that uses behavior driven design, I think it's called, where you can express your tests in plain
English. And then it uses the Selenium package to do web browser automation, or some other type of automation to then carry out those tests. So when I read a guided interview, I also in tandem write a test script, which is human readable using this Lettuce package. And the Selenium package for web browser automation is the best thing ever. I've done web browser automation with lots of other tools, but Selenium is amazing. Yeah. Oh, those are both really interesting. I love them.
Yeah. That's all the packages I can think of. Yeah. Yeah. Great. So final call to action. People want to get involved with conducting these interviews using DocAssemble. What do you tell them? I think they can check out the website and join our Slack channel and get involved in the DocAssemble community. We also have annual conferences now called Docacon, which take place in the summer every year.
Yeah. And the other thing I just think people, what I would like Python developers to do is get jobs at law firms and then sort of infiltrate and then find ways to automate what they're doing. Because there aren't enough people with programming skills in the legal field as a whole. And so I think as a result, we're kind of behind the times. That's good advice. And it sounds like DocAssemble is coming along strong. It's pretty wild that you
already have a conference about it. So very cool. Well, congrats on the project and thanks for being on the show. Thank you. You bet. Bye. This has been another episode of Talk Python To Me. Our guest on this episode was Jonathan Pyle, and it's been brought to you by Linode and Datadog. Linode is your go-to hosting for whatever you're building with Python. Get four months free at talkpython.fm/Linode. That's L-I-N-O-D-E.
Datadog gives you visibility into the whole system running your code. Visit talkpython.fm slash datadog and see what you've been missing. They'll throw in a free t-shirt. Want to level up your Python? If you're just getting started, try my Python Jumpstart by Building 10 Apps course. Or if you're looking for something more advanced, check out our new async course that digs into all the different types of async programming you can do in Python. And of course,
if you're interested in more than one of these, be sure to check out our Everything Bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.
Now get out there and write some Python code. I'll see you next time.
