16
Aug

Compression in Python Programming Tutorial


Hey guys What’s going on welcome to another python tutorial this tutorial is going to be covering compression in Python so typically the purpose of compression is usually as simple as somebody wanting to Decrease the size of data maybe on a harddrive that is like a backup data or something like that or just storage right or even like shear large files you know if you’re going to send them over skype or Something like that you’d compress it first to save a lot of time and keep the file size low but it can also be used for higher complexity tasks or even used for the same sort of Reasoning that you would compress something to send it over skype, [and] you can do this within your programs in python, right? So people usually are just using it for themselves But don’t really consider using it within their own burdens and programs right so a simple ish Example would be if you engage in any sort of like automated backing up of your data for example So you could backup your data compress, right? You can compress it save that right and then you know maybe automatically either weekly, or daily or however often you run your backups [you] just have a process that opens the compressed file then compresses it, appends the new Data recompresses it and And that’s it and and also along that same kind of thinking you can use this actually as a way to Borrow space or memory that you don’t actually have so if you have a two gigabyte memory limit And you’ve got a bunch of big lists that you’re reading in possibly But you’re not constantly using those lists or there’s never a time where you need both lists at the exact moment yet, you don’t want to spend too much time reading them into memory and all that you can just compress them and decompress them when You need them again. So it really is going to depend on What kind of factors and variables come into play for you, but that can also save you a lot of time, so another example? Why this would be necessary is if you’re doing any sort of networking or supercomputing or anything like this? Even if it’s local right in the case of most supercomputers or I don’t even know might not even be most Supercomputers are local depends how many botnets are out there? Even if it is local though you’re going to be consuming bandwidth and dealing with latency so compression becomes Especially useful if you’re if you’re doing any networking like with remote servers from each other right? so it’s in many cases going to take less time for you to compress the file locally send it You know send the compressed file over and then decompress it when you get it yet a lot of people aren’t really doing that but you can save a ton of resources and if you’re if you’re sending large enough files And you’ve got a business here managing that’s large enough. It can save a lot of money obviously again, just like the other one There’s a lot of factors and variables that are going to come into play usually it’s just as simple as a mathematical operation. Also I’ll show you guys some benchmarking of it at the end of the video and Just to see you know is it actually worth your time and and how because you can also vary how How compressed quote unquote that you want that stuff, right? so yeah, so anyway, let’s go ahead and get into it and So I’ll show you guys both you know to compress and decompress via your just virtual memory, right? So just the stuff that you’re dealing with mid program But then I can also show you guys how to do it with something like a text file because it is all slightly different so because with files, it’s usually best to use Base64 and encode and decode Around the stuff that you’re actually going to compress So it’s always best because you could do it You know you could fix the problem if you didn’t do it But it’s always best to kind of do it the right way going in so that’s how I’ll be showing it so let’s go ahead and get started now and We’re going to need to import Four things here the first one is going to be zlib and this is basically the library that allows us to compress and decompress stuff the next thing that we’ll want is system, so Basically what we’re going to use that for is just to grab the size of files for benchmarking later on then we’re going to import also time so we can actually measure the processing time again for benchmarking And then finally we want base 64 and this is to encode and decode our files if we want to save them to an actual file Now the next part either you can make your own you know text file and just like populate it with a bunch of data You know just do something like this basically okay, and just copy and paste it a bunch of times in your file And just make it a decent size so but for me, I’ve just used this file. It’s about Two megabytes, it’s just GBP USD Forex Ratio data So that’s what I’m going to use as an example but again You just just use anything and make it you know hopefully over a megabyte just for purposes if you want to match me make it To about two megabytes, so it’ll be open and for me. It’s GBPUSD1d.txt One-Day, dot txt. If you’ve done that there is a tutorial where we use this file if you do have that file You can just use yours and then we’ll go ahead and read into memory now. We’re going to go ahead and print How big this file is so we’ll say something like Raw size : and then sys.getsizeof and that’ll tell us the size of the text like how much memory is text using for us So that’ll tell us that so now let’s go ahead and do a quick example of compression so to do compression you would say We’ll say compressed equals zlib.compress And then you can put you put basically two parameters through here the first ones necessary And it’s what are we compressing it’s going to be text and then the next parameter. You don’t have to use it, but you can And it’s basically a number um it’s going to be 0 through 9 is applicable I’m not sure why 0 is even applicable wow applicable and hopefully it’s even applicable because 0 means no compression And then 1 is like the least amount of compression possible and 9 is the most so let’s use 9 to start Now what we want to do is I guess we’ll just we’ll just print out the compressed size, so we’ll say print 9 compressed size, so 9 just refering to max compression right and then we’ll say comma sys.getsizeof compressed and then And that’s it for now. We’ll add a few more subtleties later on when we go to Benchmark it [so we’ll save that and That should be good. Let’s go ahead and run that now So the the raw size of this was uh you know 1900 well 1909 Kilobytes Right so 1 million nine hundred nine nine fifty seven bytes and the compressed size is only two hundred and thirteen Thousand So obviously a significant compression now for the next thing I want to do is like let’s say That we want to save this data. So compressed is that data so like let’s just see what happens when we save this so savecomp equals open and we’ll just make up a file name here compdata.txt With the intention to append and then savecomp.right Uh we’ll just write compressed, and then we’ll go ahead and and have it closed even though. We don’t need to really Will do that so so let’s run that now And press saved it, so if I come over here, let’s see compdata. Let’s bring that compdata up and Here it is so um It’s obviously quite ugly Right and this is why we have to encode it if we want to save it into a file because what we could do if we leave it in memory, so if I do this Now it’s compressed The next thing that we would do is decompress it so let’s say decompressed equals zlib. decompress and we want to decompress compressed? And in fact let me just well I’m just going to make another variable here called text equals something and I’m going to say this is a test So now you’ve seen how how much compression can help we don’t really need to use that massive file just yet And I do want to show you guys The full conversion, so not only that but we’ll print compressed down here So the original text is this is great It compressed it will print out the compression, and then it’s going to decompress it and then we’ll print Decompressed just for example here So you can see it visually happening within python see it’s okay to compress it into this data then decompress it no Problem, but what happens when we instead well I guess we’ll actually leave this commented out so that file still exists So I’ll remove this and let’s just say decompressed Instead what we’re going to do is decompress another file, so we’ll say Comp file equals, and we’re going to open up that comp data file, so it’s going to be equal open compdata.txt With the intention to read and let’s read that file, so it’s going to be that kind of gibberish We were just looking at right this stuff um so compfile, so instead of decompressing compressed let’s decompress compfile and print out decompress So let’s run that now see what happens it’s getting angry at me. We haven’t defined text and it wants to know something about well Just let it open this file just to not waste y’all’s time anymore Run it There we go so as you can see it This was the decompression that attempted to do right it attempt to open it read it and then we say Decompressed equals you know this let’s decompress compfile, and we in there Incompatible or truncated stream so how do we deal with that with files? well if you were paying attention at the beginning I was telling you guys that what we’re going to have to do is change up how we’re compressing the data so in the interest of keeping it simple instead of using GBPUSD text I Am going to in fact let’s just clear out everything because this is getting messy so let’s clear this and now this time we’ll say Text equals compression example with encoding So now what we’re going to do is we’re going to say compressed equals Base64 .b64encode What do we want to encode well? We want to encode the zlib? compression of text and we’ll give it a full nine compression just for example so Compression is done. We can actually print compressed, and then we’ll actually save it so output equals open And this time we’ll say encode comp.txt and again with the intention append And then we’ll say output.write will write compressed we’ll do output.close and Then we’re going to decompress this so how do we do that now? It’s pretty simple. We’re going to go decompressed Equals and basically the idea is you’re gonna have to flip the idea of how we did this you just kind of flip it around so So what we do is zlib .decompress, and what do we decompress the base64.b64 decode um And we could do compressed, but actually since we want to open that file. Let’s say read file equals open and this is encodecomp.txt with the intention to read it and we’ll go ahead and read it in the memory So it will decode readfile So that’s the same thing. We did before only this time we’ve encoded it and then we want to Decode it. So we encoded it up here as we saved it, right So we saved it with the encoding and then we’re going to decode it for decompression, so let’s go ahead and run that and see if I made any errors in my ways well the the process worked, but we didn’t print it out. We printed compression, right so that’s print Decompress save that run it again, okay? So now you can see that it worked and obviously if you can’t tell the difference between this gobbledygook, and this is basically We’re using ASCII characters, right and so if we come over here. What was the file? we did encode comp so we can actually open this file and Here’s our encoded Compression and as you can see yes, it looks much better. So obviously it’s easier to read now There are ways around this right you could read the compressed file of that other one this ugly one you could do that So if you ever came across something like this it would just require Some more it would really require first the encoding of this into UTF-8, and then you would decode it, but we’re not going to get into that so anyway So that’s that now the last thing. I want to show you guys is The benchmarking of this and we’re not going to code it up already programmed it so we don’t take any time there so let me pull it over what I’ve got for the Make sure this is what I wanted okay, so basically the same idea here let me pull this over now and This is the code if you wanted to mimic it you can I think I’ll just copy and paste this code right here Into and then I’ll add for the text file stuff. I’ll just copy and paste that into a Web page on my website, but anyway so it’s pulling the data and basically what’s going to happen here is start time equals the current time write the Unix timestamp and then It’s going to go ahead and run the compression and it’s going to measure how long did it take to run that compression? Then the next thing it’s going to do is Decompress it and then it’s going to ask how long did it take to do that entire compression in decompression? and just keep in mind this is slightly flawed because of this calculation here for the comp time so if you didn’t really want if you want to know the real full compression decompression time plus you know my few like Nanoseconds or something or just tiny nanosecond so you could run just this one? But that’s not totally necessary since we’re measuring all of these on the same scale [then] after we’re done it just says you know, this is a nine This is a seven a five a three and then the lowest compression possible of one and then at the end of it We just say the size of that, how long did it take? what the ratio is like how big is the compressed file in relation to the original and then finally the full compression and decompression time Spit that out as well so with that. Let’s go ahead and run it and The raw s… I try to drag it fast enough, but it couldn’t do it anyway The raw size obviously that and now you can see the times that it takes so the full thick compression takes almost a second Right well pretty much point nine seconds so and that in the compress size is twenty one or two hundred thirteen thousand bytes as Opposed to this you know to the degree of seven or a compression is Just slightly um infact wow it’s actually? That’s quite interesting the 7 is smaller than the 9 and took less time so don’t use 9 at all apparently But then as you can see from 7 to 5 the file size does indeed get larger But the time took is significantly less, so it got a little larger, but the time it took was not very much in the Original ratio so so yeah like 7 to 5 it got you know if you compare this number to this number It did get slightly bigger, but if you actually compare the original ratio I mean, we’re talking just you know pocket change here and then for the time saved I mean you pretty much you know four times quicker to do this and then here we notice very little time saved But then from three to one we notice again a very large time saves and again We’re only at you know 17 points one seven, you know It’s like 17 percent of the original size still a massive change even if you use one very fast from this method And then the other times you can see let’s go ahead and run it one more time just to see if that first number was a fluke because that was that’s kind of weird, but we’re seeing it again I’m pretty surprised. Not sure how that could possibly be but it is um Anyway again, here’s the numbers fairly close to the original pass that we did? Actually I at least some of these are identical Looks like they’re all identical pretty crazy probably usually a processing time will vary slightly Anyway, this video is probably getting too long, so that’s the video I’ll put the link in the description for the I’ll put this on a web page on the website So you guys can play around with it? And then I’ll just copy and paste the whole encode and decode for saving the files for example so anyways hopefully you guys learned something new Hopefully you guys enjoyed as always thanks for watching thanks for all the support the subscriptions and until next time

Tags: , , , , , , , , , , , ,

17 Comments

  • mark Year says:

    i have a question. Do you know how to create a compressed file. For example in my class we have to get a file that has a text and get the frequency of each word and so and then compress the file. I was wondering if you could help me. i  am just so lost.

  • burgie shr3d says:

    The downside to using base64 output for the compression is that you don't get quite as good a compression ratio.  To avoid using it – the file object used for writing the compressed data needs to be set to 'wb'  (binary write), and 'rb' (binary read) when decompressing.   In my testing:  my decompressed file size was ~130 KB,  base64 encoded compressed file size was ~80 KB, whereas in binary format, the compressed file size was ~60 KB.  Examples:

    WRITING DATA

    plainData = open("decompressed.txt","r").read()
    compFile = open("compressed.txt","wb")
    compFile.write(zlib.compress(plainData)
    compFile.close()

    READING DATA

    compressedData = open("compressed.txt","rb").read()
    plainData = zlib.decompress(compressedData)

    Hope that helps  🙂

  • Destrica UK says:

    Please could you help me, I get the following error:

    Compression: 

    Raw Size:  104270
    Traceback (most recent call last):
      File "C:/Users/Emma/Desktop/Compression Algorithm.py", line 25, in <module>
        end()
      File "C:/Users/Emma/Desktop/Compression Algorithm.py", line 23, in end
        main()
      File "C:/Users/Emma/Desktop/Compression Algorithm.py", line 8, in main
        compressed = zlib.compress(text, 9)
    TypeError: 'str' does not support the buffer interface
    >>> 

    For this code:

    import time, base64, sys, random, winsound, doctest, urllib, math, zlib; from tkinter import *
    #start

    def main():
        text = open("sample.txt", "r").read()
        print("Raw Size: ", sys.getsizeof(text))

        compressed = zlib.compress(text, 9)
        print("9 Compressed size: ", sys.getsizeof(compressed))

        savecomp = open("compdata.txt", "a")
        savecomp.write(compressed)
        savecomp.close()
        print(compressed)

        decompressed = zlib.decompress(compressed)
        print(decompressed)

    def end():
        print("Compression: ")
        print()
        time.sleep(1)
        main()

    end()
    #end

  • GlennBen says:

    I would like to learn how to write my own algorithm for compressing files, but I don't know what technologies to learn. In simpler terms what do I have to learn to write my own compressing algorithms like zlib, if that makes sense? Like how do I create another version of zlib?

  • Self Betterment says:

    Could you please help me with this error, when I try and run your code it does this.compression:import zlib,sys,time,base64text = open('file.txt','w').read()
    print 'raw size:' ,sys.getsizeof(text)compressed = zlib.compress(text,9)
    print '9 compressed siz:',sys.getsizeof(compressed)savecomp = open('compdata.txt','a')
    savecomp.write(compressed)
    savecomp.close()that is the code by it says syntax error on the  2nd apostrophe on the line print 'raw size:' ,sys.getsizeof(text)

  • solitariobrz says:

    I'm here because of Pied Piper 😉

  • August says:

    what version of python are you using?

  • Harambe says:

    Who's here because your shitty school is making you watch this?

  • Yuva Raja says:

    could you also do videos for data structures,
    algorithms –basics,intermediate in python ?

  • G M Prashanth says:

    Please could you help me I get the following error:
    Traceback (most recent call last):
    File "C:UsersPrashanth GMAppDataLocalProgramsPythonPython36comp.py", line 7, in <module>
    compressed = base64.b64encode(zlib.compress(text))
    TypeError: a bytes-like object is required, not 'str'

    I am Using Python 3.6.0 Shell

    My Code is

    import zlib
    import sys
    import time
    import base64

    text= "compression example with encoding"
    compressed = base64.b64encode(zlib.compress(text))
    print (compressed)

    output = open("encodecomp.txt","a")
    output.write(compressed)
    output.close()

    readFile = open("encodecomp.txt","r").read()

    decompressed = zlib.decompress(base64.b64decode(readFile))

  • harry howes says:

    What version of python is this?

  • Kostas Nikolouts says:

    Can we compress a file then compress the compressed and so on…?
    Also when I tried to compress a sting I got an error who said that I can compress only bytes-like objects.What it is going on?

  • Shritej Thorve says:

    I am getting sample cod e from ur website

  • Stefan says:

    If anybody is getting the error message: TypeError: a bytes-like object is required, not 'str'

    You need to convert the text file into bytes:

    text_string = open('compress_me.txt','r').read()

    text_bytes = text_string.encode("utf-8")

    compressed = zlib.compress(text_bytes,9)

    print("9 compressed size:",sys.getsizeof(compressed))

  • Newton Tavengwa says:

    inspired by pied piper to search about compression

  • Biru Singh says:

    Can i use this for compressing images ?

  • Shivam Bhirud says:

    can we use multithreading to make the compression faster somehow?

Leave a Reply

Your email address will not be published. Required fields are marked *