tag:blogger.com,1999:blog-33068481555971190882024-03-04T20:56:36.178-08:00SHMsoft blogAI, ML, Big Data - and some eDiscoveryMark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.comBlogger459125tag:blogger.com,1999:blog-3306848155597119088.post-16389841740059983952024-01-24T22:15:00.000-08:002024-01-24T22:21:16.110-08:00How to clear cache in a browser<p> 1. Click on three dots at the top left, then click on Settings</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaxcU9r6aAkLEeoVjZhrQnD6T29CM4GCsuss_dsfGoiJyFWItQzvEZP8SIYPTIbazMUfptnFoeR6t9gy1yKdsceSt2aEk9BDXNRBbZ5BcS7YqfkbaVLKbeqPre_Z71ncijiZ52q_-zQwovdUZwrPgAH03pGWBdQEfGnF2JHVhdj4DRJEDI7GiCCc5Iu5g/s1920/Screenshot%20from%202024-01-25%2000-12-06.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaxcU9r6aAkLEeoVjZhrQnD6T29CM4GCsuss_dsfGoiJyFWItQzvEZP8SIYPTIbazMUfptnFoeR6t9gy1yKdsceSt2aEk9BDXNRBbZ5BcS7YqfkbaVLKbeqPre_Z71ncijiZ52q_-zQwovdUZwrPgAH03pGWBdQEfGnF2JHVhdj4DRJEDI7GiCCc5Iu5g/s320/Screenshot%20from%202024-01-25%2000-12-06.png" width="320" /></a></div><br /><p></p><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div>2. Search for "clear," then click "Clear browsing data."</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><span> </span><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhALwO9RfQmsmE8Bva7RmMF26s0XfxixmoVN86kfLsGKOXtr2F4cvrT6Z8vnOHcSQatYhYYqUFTmDl_3Tvs3HM0kPevdEsm2JH4hNPPp75vbV1WNiYktMSkbPqyeESrrVIfGEf_e0xALCILlh0RI1suBHLZbh9WLZsCewq9MYR_1tKkvIL-WOFFowWEm80/s1313/Screenshot%20from%202024-01-25%2000-13-05.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="689" data-original-width="1313" height="168" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhALwO9RfQmsmE8Bva7RmMF26s0XfxixmoVN86kfLsGKOXtr2F4cvrT6Z8vnOHcSQatYhYYqUFTmDl_3Tvs3HM0kPevdEsm2JH4hNPPp75vbV1WNiYktMSkbPqyeESrrVIfGEf_e0xALCILlh0RI1suBHLZbh9WLZsCewq9MYR_1tKkvIL-WOFFowWEm80/s320/Screenshot%20from%202024-01-25%2000-13-05.png" width="320" /></a></div><br /><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div>3. Choose which data to clear, then click "Clear data."</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4qr6_fCmZxzWSy5HEqkcNdAaFZKmwGwbpi-KzfRgYB6a-txJb55_yiWTzMK7Ar4VTw-tanl7fjhK7WyfFhuO0MVIVcUMiCaFtRvU55HH3ubTIFczWla_-AgNlLrcIZ9_gh7kwAW3iLheTJkZw9xkU3NtGSPgEYjyvU8m2rr4d76Gshs9stF36jrL0Jbs/s620/Screenshot%20from%202024-01-25%2000-13-31.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="620" data-original-width="517" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4qr6_fCmZxzWSy5HEqkcNdAaFZKmwGwbpi-KzfRgYB6a-txJb55_yiWTzMK7Ar4VTw-tanl7fjhK7WyfFhuO0MVIVcUMiCaFtRvU55HH3ubTIFczWla_-AgNlLrcIZ9_gh7kwAW3iLheTJkZw9xkU3NtGSPgEYjyvU8m2rr4d76Gshs9stF36jrL0Jbs/s320/Screenshot%20from%202024-01-25%2000-13-31.png" width="267" /></a></div><br /><div><br /></div>Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-3604658977140610032023-04-23T19:23:00.011-07:002023-04-23T20:49:25.676-07:00Who wrote it, human or AI?<p>I asked AI Content Detector to check if the content was generated by a human or by AI. The Detector failed. Here is what I did.</p><p>First, I asked GPT4, "What would it take to avoid detection as being an AI generated content?" It gave me a list of items. I asked my question <a href="https://chat.openai.com/?model=gpt-4" target="_blank">here</a>.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjQxq55kkrsOWBL9O1-erk427yZ_fJ3WJEK1VxuFP3Q0UrhoSFVQs7ahjMHcuWVXcWvp1xZ_sl01B9raA0ACwS7sD0QkBuYYwlKOVaobJ8waqhvA8bbmFEof2ICN6iM9qI1qQ6txD4cgxa5AXVGJxi-qF3x01vcY2CMMJNE-t5FhZqIUZ26MUHlroS8" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img alt="" data-original-height="804" data-original-width="747" height="640" src="https://blogger.googleusercontent.com/img/a/AVvXsEjQxq55kkrsOWBL9O1-erk427yZ_fJ3WJEK1VxuFP3Q0UrhoSFVQs7ahjMHcuWVXcWvp1xZ_sl01B9raA0ACwS7sD0QkBuYYwlKOVaobJ8waqhvA8bbmFEof2ICN6iM9qI1qQ6txD4cgxa5AXVGJxi-qF3x01vcY2CMMJNE-t5FhZqIUZ26MUHlroS8=w595-h640" width="595" /></a></div><br /><br /><p></p><p></p><div class="separator" style="clear: both; text-align: center;"><br /></div><p>Then I asked it to generate some content using its own advice.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhcl7U7jQQl_tW6zItxWrCDoSSYT0bTDvjm9qIy08k_lmDfpqHnm9SLnKdzPVqFOcaFgt2-FnQs7cobmedpyVt76_TmNDlEPqCN5Uo-ysYuTMGSPpcO4GOJ0pH_JyZNfI_WlDaudK4G8dqjlvz0l7qLPWg59svlpifOtEp5fVCsS77iVuttvV27M_In" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img alt="" data-original-height="337" data-original-width="751" height="286" src="https://blogger.googleusercontent.com/img/a/AVvXsEhcl7U7jQQl_tW6zItxWrCDoSSYT0bTDvjm9qIy08k_lmDfpqHnm9SLnKdzPVqFOcaFgt2-FnQs7cobmedpyVt76_TmNDlEPqCN5Uo-ysYuTMGSPpcO4GOJ0pH_JyZNfI_WlDaudK4G8dqjlvz0l7qLPWg59svlpifOtEp5fVCsS77iVuttvV27M_In=w640-h286" width="640" /></a></div><br /><br /><p></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p>Now, I copied and pasted this text into the <a href="https://copyleaks.com/ai-content-detector" target="_blank">AI Content Detector</a>. </p><p><br />At that, AI passed with flying colors, and the AI Content Detection confidently declares, "This is human text."</p><p><br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEj6_Coi9mYjHSRtyJHxwVQ92ktxgpKbtuX-eYsSWJAn9mOWciWsPJgoLF-PubNxvLTuDOxOI3m_Pxgq9ZM6Wj8zlZ_loXZkHDHfLDRYq5VyOcmb6NaPWh9ynWnMWK2iTMKXaAbMguJg8aA4uTPFGm-oeLg7y_34PgjL-imUGIim8BBTuDvCnYxTKNz4" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img alt="" data-original-height="519" data-original-width="1056" height="314" src="https://blogger.googleusercontent.com/img/a/AVvXsEj6_Coi9mYjHSRtyJHxwVQ92ktxgpKbtuX-eYsSWJAn9mOWciWsPJgoLF-PubNxvLTuDOxOI3m_Pxgq9ZM6Wj8zlZ_loXZkHDHfLDRYq5VyOcmb6NaPWh9ynWnMWK2iTMKXaAbMguJg8aA4uTPFGm-oeLg7y_34PgjL-imUGIim8BBTuDvCnYxTKNz4=w640-h314" width="640" /></a></div><br /><br /><p></p><p>What are the teachers to do? I mean, how do they distinguish whether the student wrote his work or the AI did. As a joke, I can suggest the AI would do it better. But then it can imitate a bad student as well.</p><p>A better advise comes from the CEO of OpenAI, creator of ChatGPT. According to Sam Altman, the teaching process will have to change. The teacher will spend more time interacting with the students and teaching them. A practical advice: ask the student in class to outline their ideas, and they can expand on them at home. Keep in mind that the expansion may come from ChatGPT.</p>Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-28690733038000599882023-04-13T19:54:00.005-07:002023-10-02T17:29:28.361-07:00My favorite books on quantum physics<p> Because of my interest in quantum computing, here are my favorite books on quantum physics. Of course, I am very much <a href="https://www.youtube.com/watch?v=rbcNQB7VMtI&t=6s" target="_blank">indebted to Olivia Lanes</a>, but I also added some of my own.</p><p><br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGHUY65kQrlX9qIAT11V58SCifiQUIpWKLsSjCKO5DXLAZFZ8DQPwT-_hsXPgBoLvaC-A-4HNhOstPHO5CUzrgrnvXty0TjEPeMsWtsTFiKsWt7_5udzSTEnwtwsB24AEfl2uX-cYntZHSbM7B1LA32jF9MLrFTm0wmh25ARiz1BvDzmgz6jlnN65i/s499/qw.jpg" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="499" data-original-width="357" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGHUY65kQrlX9qIAT11V58SCifiQUIpWKLsSjCKO5DXLAZFZ8DQPwT-_hsXPgBoLvaC-A-4HNhOstPHO5CUzrgrnvXty0TjEPeMsWtsTFiKsWt7_5udzSTEnwtwsB24AEfl2uX-cYntZHSbM7B1LA32jF9MLrFTm0wmh25ARiz1BvDzmgz6jlnN65i/w143-h200/qw.jpg" width="143" /></a></div><br /><p><a href="https://www.amazon.com/Quantum-Entanglement-Press-Essential-Knowledge/dp/026253844X/ref=sr_1_1" target="_blank">Quantum Entanglement by Jed Brody</a>.</p><p>The most modern of the list, a PDF is easily found on the web, and the book is free on Audible! It is published in the MIT press essential knowledge series, which is always very good.</p><p><br /></p><p><br /></p><p><br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqP55JmMwFO84v47Ov0o-sxS7v5XTdldvQPdQJutQlucI_ZEKROiHo67p0yFvanXLlidqs6FgkZxaOM-MAPR6NvgHxJoNvgnZtGpp2vccj82XgbAr66MPM518cfSxTuL3HadGfoD7k2-9L_5KFYTeE3dsa5ph4a5GT2T54PeZmUtUcWbOGAyqWvtoC/s500/ama.jpg" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="500" data-original-width="331" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqP55JmMwFO84v47Ov0o-sxS7v5XTdldvQPdQJutQlucI_ZEKROiHo67p0yFvanXLlidqs6FgkZxaOM-MAPR6NvgHxJoNvgnZtGpp2vccj82XgbAr66MPM518cfSxTuL3HadGfoD7k2-9L_5KFYTeE3dsa5ph4a5GT2T54PeZmUtUcWbOGAyqWvtoC/s320/ama.jpg" width="212" /></a></div><br /><p><a href="https://www.amazon.com/Amazing-Story-Quantum-Mechanics-Exploration/dp/1592406726" target="_blank">Fun read. Superheroes take on quantum physics.</a></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikSTTeowu3lvkiVyur8KosDvopgqiYZisjbZwAlYPUSvf0Y_xZZ399WWZPkaPFGSMNZbMal6Lg8giU8ofuxrlX8wTG1oLvAcgv-6mvX1SUKt69XOwYOS0pdfV_673RE6eJQkZWUvr2U8r32rOGJ35u_zK8pPemvS-XUEfsInyRsWgL2wf1QNhSqjxU/s293/tot.webp" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="293" data-original-width="206" height="293" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikSTTeowu3lvkiVyur8KosDvopgqiYZisjbZwAlYPUSvf0Y_xZZ399WWZPkaPFGSMNZbMal6Lg8giU8ofuxrlX8wTG1oLvAcgv-6mvX1SUKt69XOwYOS0pdfV_673RE6eJQkZWUvr2U8r32rOGJ35u_zK8pPemvS-XUEfsInyRsWgL2wf1QNhSqjxU/s1600/tot.webp" width="206" /></a></div><br /><p><a href="https://www.amazon.com/Totally-Random-Understands-Mechanics-Entanglement/dp/0691176957/ref=sr_1_1" target="_blank">With characters are Enshteinish and Schrodinger Cat.</a></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGJmGjAsGo0eoHxphpoYw667z4MKGTROR5Arwv8MlFPhs3oDAA8gUfrJKHJW61hpXAXTuX_rh3q7J1CtJ8CgqmWGcUEwfBaQY8vzpx_ZQlshwJytiM6AG4WsHN3BClrCDPTbi_68H9kdBnmy5PvS_nJ1kcjnBhEqnOTWTsmMxL8Kp3BFAR60sF-yKB/s500/wgat.jpg" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="500" data-original-width="333" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGJmGjAsGo0eoHxphpoYw667z4MKGTROR5Arwv8MlFPhs3oDAA8gUfrJKHJW61hpXAXTuX_rh3q7J1CtJ8CgqmWGcUEwfBaQY8vzpx_ZQlshwJytiM6AG4WsHN3BClrCDPTbi_68H9kdBnmy5PvS_nJ1kcjnBhEqnOTWTsmMxL8Kp3BFAR60sF-yKB/s320/wgat.jpg" width="213" /></a></div><br /><p><a href="https://www.amazon.com/Quantum-Physics-Everyone-Needs-Know%C2%AE/dp/0190250712/ref=sr_1_1?crid=1V3F64UK48Q3E" target="_blank">Quantum Physics</a> - what <b>everyone</b> needs to know.</p><p>Serious and very clear. In 2015, has a chapter on quantum computing and what such quantum computers would do if they ever exist. Now they do!!</p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7vC2GJBPFyS8eUPVXoWr6Z7adTfYrJOBuWGQ9UYssaV4SFVTuD6ue82Zqk0vDaKHZpIJ5tB5ZD6MTkLv-g-rLQfte8exmjbrkJyRPP0g_bOirF5aqvDnPwl4VA8VUaufl4uAK3tx8nJ17lQr0pMefXc8mmFvrHHuMWx5pAo7MKtMTK8to9U0xH_yH/s293/str.webp" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="293" data-original-width="205" height="293" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7vC2GJBPFyS8eUPVXoWr6Z7adTfYrJOBuWGQ9UYssaV4SFVTuD6ue82Zqk0vDaKHZpIJ5tB5ZD6MTkLv-g-rLQfte8exmjbrkJyRPP0g_bOirF5aqvDnPwl4VA8VUaufl4uAK3tx8nJ17lQr0pMefXc8mmFvrHHuMWx5pAo7MKtMTK8to9U0xH_yH/s1600/str.webp" width="205" /></a></div><br /><p><br /></p><p><br /></p><p><br /></p><p>A popular explanation from Cambridge - technical enough. But if you are interested in quantum computing, light reading does not suffice.</p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxmOsAUXdceWzjSpJ3vn0mRtIgtTHlR8f3jC9X6KneAXlW6a7oDunDTn9VAaQOTN3v9wWqb_eNmu989tRTXBTCuiFIwZp8pHn0QtAu9ROvOE60zwdPTEFmAjNZaEsVOAX9mNTiL_YD0myueU6l35lQ_OcDnYkbTGEAyNO-K1Y5tu3f3uDQYekFQIp5/s499/6.jpg" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="499" data-original-width="387" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxmOsAUXdceWzjSpJ3vn0mRtIgtTHlR8f3jC9X6KneAXlW6a7oDunDTn9VAaQOTN3v9wWqb_eNmu989tRTXBTCuiFIwZp8pHn0QtAu9ROvOE60zwdPTEFmAjNZaEsVOAX9mNTiL_YD0myueU6l35lQ_OcDnYkbTGEAyNO-K1Y5tu3f3uDQYekFQIp5/s320/6.jpg" width="248" /></a></div><br /><p>Finally, a textbook for students, with formulas and exercises.</p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p>The best (others are also good)</p><p><a href="https://www.amazon.com/Quantum-Computation-Information-10th-Anniversary/dp/1107002176">https://www.amazon.com/Quantum-Computation-Information-10th-Anniversary/dp/1107002176</a></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwMpzBC9GfCz6DYmTq00fJEnhroaP8kuOGVHyQFhny6e0RR_Pt2zYonRIkYlKt20lT3TrpxRQtx9PNleZL96mI8xq_uF6Vuxx8JMa5ukrN9zgEOfSsccy1phyphenhyphen0kMv3Ry67Lcn0EePhvF0ifHlaIXrQ9Q6tCMx3FI04pXeYO3JKRw6N9YOC606o_gYtXww/s820/Screenshot%202023-10-02%20at%206.47.14%20PM.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="820" data-original-width="588" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwMpzBC9GfCz6DYmTq00fJEnhroaP8kuOGVHyQFhny6e0RR_Pt2zYonRIkYlKt20lT3TrpxRQtx9PNleZL96mI8xq_uF6Vuxx8JMa5ukrN9zgEOfSsccy1phyphenhyphen0kMv3Ry67Lcn0EePhvF0ifHlaIXrQ9Q6tCMx3FI04pXeYO3JKRw6N9YOC606o_gYtXww/s320/Screenshot%202023-10-02%20at%206.47.14%20PM.png" width="229" /></a></div><br /><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><div style="background-color: white; color: #080808; font-family: 'JetBrains Mono',monospace; font-size: 9.8pt; white-space: pre;">How to talk about quantum computing to your teenager<br /><br /><span style="color: #00627a; font-style: italic;"><a href="https://www.quora.com/What-is-an-intuitive-explanation-of-quantum-computing">https://www.quora.com/What-is-an-intuitive-explanation-of-quantum-computing</a></span><br /></div>Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com2tag:blogger.com,1999:blog-3306848155597119088.post-59157018005886847282023-01-12T09:45:00.009-08:002023-02-07T19:05:51.124-08:00Pointers for Google Certification Exam<p>I am a Google trainer, and I failed the re-certification exam after four years of teaching. Now I am done with this embarrassing confession. I asked my colleagues, also trainers. Here I am collecting their advice, with my comments.</p><p></p><ol style="text-align: left;"><li>The exam got harder. Be humble and prepare for the exam.</li><li>Study Google <a href="https://cloud.google.com/certification/cloud-architect">documentation</a>, in particular, the <a href="https://cloud.google.com/certification/guides/professional-cloud-architect/">description of the companies</a> on which the questions are based.</li><li><a href="https://acloudguru.com/">Cloud Guru course</a>. It is two years old and has a 2-hours practice test at the end. Add your own knowledge and experience that you can get from <a href="https://www.cloudskillsboost.google/">QwikLabs</a>.</li><li>Question dump - you can find them on the web. They may be outdated and (very importantly) give you the wrong answers. At least the Apigee test dump was that way.</li><li>Good luck!</li><li><a href="https://www.credential.net/ff477348-f30c-4d46-9a7f-482a617a0619?key=8472f768c40e385e264f3a95a7962dc4668cceb886c1474cfbd38a6afa12e6ec" target="_blank">I passed</a>, so it must have worked.</li></ol><p></p>Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-18580260102638175592022-08-26T11:26:00.004-07:002022-08-26T11:26:47.828-07:00Demo for FreeEed<p> Recently, I taught a class about Search and Elastic. As part of this class, I gave the students a lab showcasing FreeEed as an example of a real-world application. Here is the lab which you might find helpful as well</p><p><a href="https://github.com/elephantscale/elastic-labs/blob/master/integrations/1-FreeEed.md">https://github.com/elephantscale/elastic-labs/blob/master/integrations/1-FreeEed.md</a></p><p>Enjoy!</p>Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-51683800522287130742022-06-28T09:34:00.003-07:002022-06-28T09:34:35.027-07:00I am excited by GitHub Copilot<p> Advertised as "Your AI Pair Programmer," <a href="https://github.com/features/copilot" target="_blank">GitHub Copilot</a> indeed works very well, and I am impressed. The promise is that "you write the comments, and it writes the implementation code." I did not read this documentation but just started writing code. It worked like magic.</p><p>I got a value from a hash table. It offered to check that the value existed and was not empty. The suggestion is shown in the pale font.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjTaXoV-k9vDbfWWcJaj5yHZY1VXet6Zvj-bnO2PiBhLE74E9sP-miu49W2FxU72CyEhNDHgn9kGGRNS7GvFas_pftSyz9BzG3cqZPOGnlZ6uitu-EVKosxIVuCwNHA_ZTD4Dshdez-auSzxE7SFeS2SnDGVEjDE1-loEQjor5ebeqEHtSXBtTuGVKa/s619/google-assisstant.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="118" data-original-width="619" height="122" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjTaXoV-k9vDbfWWcJaj5yHZY1VXet6Zvj-bnO2PiBhLE74E9sP-miu49W2FxU72CyEhNDHgn9kGGRNS7GvFas_pftSyz9BzG3cqZPOGnlZ6uitu-EVKosxIVuCwNHA_ZTD4Dshdez-auSzxE7SFeS2SnDGVEjDE1-loEQjor5ebeqEHtSXBtTuGVKa/w640-h122/google-assisstant.png" width="640" /></a></div><br /><p><br /></p><p></p><div class="separator" style="clear: both; text-align: left;">I then hit the tab to accept. The suggestion was bolded.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZlpktF_1NeSYmliqmK1KZ1K6xiMTRFxbL6Z56-ahtw_P0Ym9JYJovgVM7dZ4mjqKKVH_HX6554eDWMxzwMmdTZmvl5LEJEojkLvtOa3w4me5_IXFPSxjVwSD03LXIzIY-QI1lc4keIhn7tDNYFQkzQ_GkK4grpk7Yb7oUFieV6RZ-yx2e0ahEYXjc/s625/g1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="114" data-original-width="625" height="116" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZlpktF_1NeSYmliqmK1KZ1K6xiMTRFxbL6Z56-ahtw_P0Ym9JYJovgVM7dZ4mjqKKVH_HX6554eDWMxzwMmdTZmvl5LEJEojkLvtOa3w4me5_IXFPSxjVwSD03LXIzIY-QI1lc4keIhn7tDNYFQkzQ_GkK4grpk7Yb7oUFieV6RZ-yx2e0ahEYXjc/w640-h116/g1.png" width="640" /></a></div><br /><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">I saved a few seconds. I also saved some brain CPU cycles. I had a comfortable feeling that the code was in good style. I wasted half an hour sharing my excitement with the world.</div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: left;">There is more! It writes my comments. And often, it gets is right. If not, then often enough, I can still accept and change a word or two.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">Thumbs up, GitHub.</div><p></p>Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-62906893280610755362021-03-04T21:09:00.001-08:002021-03-04T21:14:57.478-08:00Security News Roundup - March 4<p>An alternate take on why the Solarwinds hack happened (Note: I read and enjoyed the article by Matt Stoller that is linked in the piece): <a href="https://www.nytimes.com/2021/02/23/opinion/solarwinds-hack.html?referringSource=articleShare">https://www.nytimes.com/2021/02/23/opinion/solarwinds-hack.html?referringSource=articleShare</a></p><p>Top 10 Web Hacking Techniques of 2020 (Must read for anyone in the web application security field): <a href="https://portswigger.net/research/top-10-web-hacking-techniques-of-2020">https://portswigger.net/research/top-10-web-hacking-techniques-of-2020</a></p><p>Interesting development in cyber insurance field, led by Google: <a href="https://cloud.google.com/blog/products/identity-security/google-cloud-risk-protection-program-now-in-preview">https://cloud.google.com/blog/products/identity-security/google-cloud-risk-protection-program-now-in-preview</a></p><p>Short post on bots plaguing the online limited-edition sneaker industry: <a href="https://threatpost.com/yeezy-sneaker-bots-boost-sun/164312/">https://threatpost.com/yeezy-sneaker-bots-boost-sun/164312/</a></p><p>Ransomware threat landscape in 2020 and 2021: <a href="https://securityaffairs.co/wordpress/115268/cyber-crime/ransomware-landscape-2020.html">https://securityaffairs.co/wordpress/115268/cyber-crime/ransomware-landscape-2020.html</a></p><p>Post from Troy Hunt about a password breach (while it is about a political site, it contains the usual details and in-depth analysis that characterize his posts): <a href="https://www.troyhunt.com/gab-has-been-breached/">https://www.troyhunt.com/gab-has-been-breached/</a></p><p>Exchange Zero Days patched by Microsoft: <a href="https://krebsonsecurity.com/2021/03/microsoft-chinese-cyberspies-used-4-exchange-server-flaws-to-plunder-emails/">https://krebsonsecurity.com/2021/03/microsoft-chinese-cyberspies-used-4-exchange-server-flaws-to-plunder-emails/</a></p>Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-15970150935190399352020-08-14T16:04:00.029-07:002022-03-03T20:31:37.658-08:00How to do Early Case Assessment with FreeEed<p>Sometimes, you have a lot of data to process for eDiscovery. So, you go to your favorite eDiscovery provider and ask them to process your data and then host it for your review. But there's the rub: processing costs X number of dollars per gigabyte, and usually, you don't want to host all the data. </p><p>Here is how you can solve this problem with FreeEED and save oodles of money in the process. First, I will explain the harder way, using the review. Then I will show how to go straight to the results, once you are more trusting the results.</p><h2 style="text-align: left;">Way 1 - with the review</h2><p> <a href="http://freeeed.org/index.php/download" target="_blank">Download</a> and start FreeEED</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQIVqnKTDkkSX12-AwMdkb0xOdfy67CoRqp1aVY-425YqhmIDeshnsXOxIA0bRfDcBIOhtLnWdWoh9JXH5f88nk3klnHcpyo_wPO1dXwqcZDRA_T7Ilo0HE_FI5OTomTEAdNn-OpaDTnw/s490/01.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="76" data-original-width="490" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQIVqnKTDkkSX12-AwMdkb0xOdfy67CoRqp1aVY-425YqhmIDeshnsXOxIA0bRfDcBIOhtLnWdWoh9JXH5f88nk3klnHcpyo_wPO1dXwqcZDRA_T7Ilo0HE_FI5OTomTEAdNn-OpaDTnw/s0/01.png" /></a><div class="separator" style="clear: both; text-align: center;"><div style="text-align: left;">Select your projects and add files to your project</div><div style="text-align: left;"><br /></div><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXdjA20nbedoUfLj7mmzcPnbHNEfctiEaY3QpqcKaHoF87VT0C55m0H03wWbX5MnbxVIT4vpNBf_8nIRxcdYQJ3W9TkbJdRpDfMZmd63pwcMrhFdtF3cIJq5NlJ__J5UJic73d4ZwwbFg/s692/02.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="403" data-original-width="692" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXdjA20nbedoUfLj7mmzcPnbHNEfctiEaY3QpqcKaHoF87VT0C55m0H03wWbX5MnbxVIT4vpNBf_8nIRxcdYQJ3W9TkbJdRpDfMZmd63pwcMrhFdtF3cIJq5NlJ__J5UJic73d4ZwwbFg/s640/02.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><div style="text-align: left;">Stage, Process, and Go to Review</div><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDQe7ybEgh6HEqVahh3k4FO8eslaiB2hhE2RT2FvZV_VTbPxHx3P97sxRbsLqHPuylC5PidB_airR97FndCmaQfSBGdrO2ys3l6F-4hmsKbqVJuY-fJGRXBLQkIAV3bKDuPmepELkxePw/s640/03.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: left;"><br /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDQe7ybEgh6HEqVahh3k4FO8eslaiB2hhE2RT2FvZV_VTbPxHx3P97sxRbsLqHPuylC5PidB_airR97FndCmaQfSBGdrO2ys3l6F-4hmsKbqVJuY-fJGRXBLQkIAV3bKDuPmepELkxePw/s640/03.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: left;"><img border="0" data-original-height="400" data-original-width="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDQe7ybEgh6HEqVahh3k4FO8eslaiB2hhE2RT2FvZV_VTbPxHx3P97sxRbsLqHPuylC5PidB_airR97FndCmaQfSBGdrO2ys3l6F-4hmsKbqVJuY-fJGRXBLQkIAV3bKDuPmepELkxePw/s0/03.png" /></a></div><div class="separator" style="clear: both; text-align: left;">In the review, find all responsive documents. </div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEha0rBRAY9BRVrnsEDdLSPAQTi5-7suQeREYszGFe70iSkdm8nNRa7zDaCGA5rW3WxTC7AuZr2YrWkMPVnCFeZ-pJFMT-MKtQ3-f1Z8FRvktGGDlpFo5jaCJF_CVoDmtTiKZLTJTrsV-vM/s1280/04.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="888" data-original-width="1280" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEha0rBRAY9BRVrnsEDdLSPAQTi5-7suQeREYszGFe70iSkdm8nNRa7zDaCGA5rW3WxTC7AuZr2YrWkMPVnCFeZ-pJFMT-MKtQ3-f1Z8FRvktGGDlpFo5jaCJF_CVoDmtTiKZLTJTrsV-vM/s640/04.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: left;">Now, simply click on "Export as Natives"</div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwBnytmMmTtoLoRsGHWH1ogQ8iDL4_F7vEcAJ8doNI0_xp0zsvFyIUwqrhnTr_i92JdxsVWEks6YfGNVJMDbpRmoUNWg-Ro4nIKLLGWlIKC2oootTfS58nCHK0blFNhayKrqXuzhN-B5k/s902/06.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="902" data-original-width="847" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwBnytmMmTtoLoRsGHWH1ogQ8iDL4_F7vEcAJ8doNI0_xp0zsvFyIUwqrhnTr_i92JdxsVWEks6YfGNVJMDbpRmoUNWg-Ro4nIKLLGWlIKC2oootTfS58nCHK0blFNhayKrqXuzhN-B5k/s640/06.png" /></a></div><div class="separator" style="clear: both; text-align: left;">Here, you got want you wanted! You know now what documents you will deal with. Read them, analyze them. </div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiobDT6aHD2hgjzC_2TIXSCefEWO7SDTBhmjI5K1Npi8FcuGqyj3yR-9uqvRxdQUH-mLNxp7W01Q58vzuTKGJhEtNgoSfH5kcaew2VSJSbUoO_zOQCe7SpafpYOxns1pB-PL0nqPxCorIk/s644/07.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="644" data-original-width="639" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiobDT6aHD2hgjzC_2TIXSCefEWO7SDTBhmjI5K1Npi8FcuGqyj3yR-9uqvRxdQUH-mLNxp7W01Q58vzuTKGJhEtNgoSfH5kcaew2VSJSbUoO_zOQCe7SpafpYOxns1pB-PL0nqPxCorIk/s640/07.png" /></a></div><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: left;">Put them into your favorite review platform, like Relativity. From there, you will be able to do production and share the documents with others who need them. <a href="https://scaia.ai/contact/">And by the way, we can set you up and help with Relativity as well. </a></div><div class="separator" style="clear: both; text-align: left;"><br /></div><h2 style="clear: both; text-align: left;">Way 2 - go straight to the results</h2><div class="separator" style="clear: both; text-align: left;">Start as above, by downloading FreeEED. But, instead of going all the way with the review, simply use culling</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtIlEwwQyRoLMMKVV5ddQBFm3PIXTlq4LBR331cMgsmtg0N51j6O68VWj35ZXSyBBCdY8U-33Mc_H5A90pMFoDRVwMZ1CgcVpoF0CHi9BVYPVlVb-AIzlUAGOptBhVaSEBjcXNMtFwkrw/s679/09.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="132" data-original-width="679" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtIlEwwQyRoLMMKVV5ddQBFm3PIXTlq4LBR331cMgsmtg0N51j6O68VWj35ZXSyBBCdY8U-33Mc_H5A90pMFoDRVwMZ1CgcVpoF0CHi9BVYPVlVb-AIzlUAGOptBhVaSEBjcXNMtFwkrw/s640/09.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">Enter your search string (I entered 'matt' but it accepts complete Lucene syntax, with metadata names and ranges), and click on process. When done, send the production results. Or, be more formal and go to Relativity, as above.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">Cheers!</div></div><div class="separator" style="clear: both; text-align: center;"><br /></div><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQIVqnKTDkkSX12-AwMdkb0xOdfy67CoRqp1aVY-425YqhmIDeshnsXOxIA0bRfDcBIOhtLnWdWoh9JXH5f88nk3klnHcpyo_wPO1dXwqcZDRA_T7Ilo0HE_FI5OTomTEAdNn-OpaDTnw/s490/01.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><br /></a></div><p></p>Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-49342142162483866852019-04-24T09:03:00.000-07:002019-04-24T09:03:00.037-07:00Smart apps meetup - follow-upHi, all who came to this meetup, <a href="https://www.meetup.com/Houston-Hadoop-Spark-ML/events/259968943/">https://www.meetup.com/Houston-Hadoop-Spark-ML/events/259968943/</a><br />
<br />
Here are the links that were mentioned<br />
<br />
<a href="https://www.deeplearning.ai/machine-learning-yearning/">https://www.deeplearning.ai/machine-learning-yearning/</a><br />
<br />
<a href="https://landing.ai/ai-transformation-playbook/">https://landing.ai/ai-transformation-playbook/</a><br />
<br />
<a href="http://playground.tensorflow.org/">http://playground.tensorflow.org/</a><br />
<br />
<a href="https://colab.research.google.com/">https://colab.research.google.com/</a><br />
<br />
Enjoy!Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-53477382979196600472018-06-18T10:41:00.002-07:002018-06-18T10:41:48.801-07:00Security Analytics At the Speed of Thought With ML and Elastic<span style="background-color: #f6f7f8; color: #2e3e48; font-family: "Graphik Meetup", -apple-system, system-ui, Roboto, Helvetica, Arial, sans-serif; font-size: 16px;">Abstract: This talk was a continuation of the discussion started in February where we will overview how machine learning in Elastic X-Pack can be used to analyze data from a data lake help the SOC (Security Operations Center) and Threat Hunting teams find malicious actors in their environment. We will demonstrate how easy it is to pivot through data and start to expand the information we have around the compromise.</span><br />
<span style="background-color: #f6f7f8; color: #2e3e48; font-family: "Graphik Meetup", -apple-system, system-ui, Roboto, Helvetica, Arial, sans-serif; font-size: 16px;"><br /></span>
<span style="background-color: #f6f7f8; color: #2e3e48; font-family: "Graphik Meetup", -apple-system, system-ui, Roboto, Helvetica, Arial, sans-serif; font-size: 16px;">Geoff presented a demo similar to this one, </span><span style="color: #2e3e48; font-family: Graphik Meetup, -apple-system, system-ui, Roboto, Helvetica, Arial, sans-serif;">https://www.elastic.co/blog/using-kibana-and-beats-for-security-analytics</span><br />
<span style="color: #2e3e48; font-family: Graphik Meetup, -apple-system, system-ui, Roboto, Helvetica, Arial, sans-serif;"><br /></span>
<span style="color: #2e3e48; font-family: Graphik Meetup, -apple-system, system-ui, Roboto, Helvetica, Arial, sans-serif;">May 23, 2018, was a great day! Thank you, all.</span>Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-5144896259639532972018-05-16T20:41:00.007-07:002022-12-27T19:48:39.993-08:00Searching Blockchain with FreeEed<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiAy3QBuiwcZnZfLbXoK9IUpdJsbiBl7EqsRY6bNBivex6S0TdJJ0teCkPXTrw-8mizCYDxV0jJgpABm5RMSi1pq7vaksVgoF4_6iHA9SWi3-iu4mxwry69jNwtUaPYeCbkBm64TEExQ8/s1600/Screen+Shot+2018-05-17+at+12.19.10+AM.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="348" data-original-width="322" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiAy3QBuiwcZnZfLbXoK9IUpdJsbiBl7EqsRY6bNBivex6S0TdJJ0teCkPXTrw-8mizCYDxV0jJgpABm5RMSi1pq7vaksVgoF4_6iHA9SWi3-iu4mxwry69jNwtUaPYeCbkBm64TEExQ8/s200/Screen+Shot+2018-05-17+at+12.19.10+AM.png" width="185" /></a></div>
The Blockchain is composed of multiple blocks that can contain any information. However, it is not a database in a traditional sense: it is not fast, and it does not answer queries.<br />
<br />
For example, the writing speed is one block every 10 minutes for Bitcoin and about one block every seven seconds for Ethereum. Queries, as such, do not exist at all: neither SQL nor NoSQL-type language is not provided.<br />
<br />
Meanwhile, the information stored in Blockchain often needs to be searched. Here is a design pattern from <a href=" https://research.csiro.au/blockchainpatterns/general-patterns/interacting-with-the-external-world/legal-and-smart-contract-pair/" target="_blank">CSIRO</a>.<br />
<br />As of today, such a tool exists. FreeEed has been used by lawyers to do eDiscovery, legal reviewers, and researchers for all kinds of investigations. It allows you to give any data as input (<a href="http://freeeed.org/">see here</a>) and indexes that data for searches. The data can be open Office files, PST mailboxes, a "load file" produced to lawyers due to an eDiscovery request, and Blockchain.<br />
<br />
We are actively working on FreeEed all the time, adding input formats, processing capabilities, and machine learning. The tool is open source and welcomes new additions. The review part is called "FreeEed Review" and works through the browser.<br />
<br />
The back end used to implement text search is Elasticsearch. This means that you can also look at the processed data through the mighty ELK (Elasticsearch, Logstash, Kibana), which is also open source.<br />
<br />
Happy searching!Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-73285859075352222812018-04-30T21:43:00.003-07:002018-04-30T21:43:40.652-07:00FreeEed with Elasticsearch (7.7.2 release)<div class="ArticleHeadlineText" data-gcf-font-size="14pt" style="font-family: Georgia, "Times New Roman", Times, serif; font-size: 14pt; font-weight: bold;">
<span _mce_style="font-family: Times; font-weight: 400;" style="font-family: Times; font-weight: 400;"><br class="Apple-interchange-newline" />Improvements in this version (7.7.2):</span></div>
<div class="ArticleHeadlineText" data-gcf-font-size="14pt" style="font-family: Georgia, "Times New Roman", Times, serif; font-size: 14pt; font-weight: bold;">
<ul>
<li><span _mce_style="font-weight: 400; font-family: Times;" style="font-family: Times; font-weight: 400;">Elasticsearch integration. Now the users get more open source tools to work with FreeEed: Elasticsearch, Logstash, and Kibana. </span></li>
<li><span _mce_style="font-weight: 400; font-family: Times;" style="font-family: Times; font-weight: 400;">Bug fixes, code refactoring.</span></li>
<li><span _mce_style="font-weight: 400; font-family: Times;" style="font-family: Times; font-weight: 400;">Go here http://freeeed.org/</span></li>
</ul>
</div>
Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-23933584841436170012018-03-29T21:56:00.001-07:002018-03-29T21:56:24.479-07:00FreeEed 7.7.1 releaseHere is what is new is FreeEed 7.7.1 release<br />
<br />
<ul style="font-family: Georgia, "Times New Roman", Times, serif; font-size: 18.6667px; font-weight: 700;">
<li><span _mce_style="font-weight: 400; font-family: Times;" style="font-family: Times; font-weight: 400;">Restored deduplication</span></li>
<li><span _mce_style="font-weight: 400; font-family: Times;" style="font-family: Times; font-weight: 400;">Better email handling</span></li>
<li><span style="font-family: Times;"><span style="font-weight: 400;">Separated processing engine code into its own project</span></span></li>
<li><span style="font-family: Times;"><span style="font-weight: 400;">All UI forms done in IntelliJ, out of NetBeans and away from commercial editors</span></span></li>
</ul>
<div>
<span style="font-size: 18.6667px;">Enjoy!</span></div>
Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-66238035145379624192017-09-12T18:18:00.001-07:002017-09-12T18:18:19.581-07:00Does FreeEed search for numbers? - Yes, it does!This question was asked by one of the users, can he find numbers in the text that FreeEed indexes. I got curious myself and checked.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmz96Fn14QhYpSXN29hyDfUYtgnoUkCV60dfqawNgBtRMLoOvQR2QH47PkJXUflooZ_gMsPV0Z3omitzuzRXtWTdWf42bw7ThLPDfvm4vXfzfmlp5w_P-t8OZaWD-2MChrTUDro0JX_q4/s1600/Screen+Shot+2017-09-12+at+8.07.11+PM.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="311" data-original-width="469" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmz96Fn14QhYpSXN29hyDfUYtgnoUkCV60dfqawNgBtRMLoOvQR2QH47PkJXUflooZ_gMsPV0Z3omitzuzRXtWTdWf42bw7ThLPDfvm4vXfzfmlp5w_P-t8OZaWD-2MChrTUDro0JX_q4/s320/Screen+Shot+2017-09-12+at+8.07.11+PM.png" width="320" /></a></div>
<br />
The reason that this is an important question is that I remember Craig Ball mentioning that in one of the requirements for good eDiscovery software. So OK, I ran a few searches and found out that out-of-the-box FreeEed does index all numbers. That felt good, and I am attaching the screenshots of the experiment.<br />
<br />
Of course, that is not a special property of FreeEed but of Tika, Lucene, and SOLR. It's these components that are responsible for what FreeEed indexes.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJfO4ANG55kJcfKbfcg-1naJCTrb1NAqLd5-TeOLBhytLH6cTYF6UCq9AuXpHp6uDCjzfR7LLM2usX70RbC95D7KyoyCzIeHCht0D1uPeOvNuL2SVXt7KZ0j4lY7GaseEQBfRXjrBkUbk/s1600/Screen+Shot+2017-09-12+at+8.07.28+PM.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="120" data-original-width="523" height="73" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJfO4ANG55kJcfKbfcg-1naJCTrb1NAqLd5-TeOLBhytLH6cTYF6UCq9AuXpHp6uDCjzfR7LLM2usX70RbC95D7KyoyCzIeHCht0D1uPeOvNuL2SVXt7KZ0j4lY7GaseEQBfRXjrBkUbk/s320/Screen+Shot+2017-09-12+at+8.07.28+PM.png" width="320" /></a></div>
<br />
Had this not been the case, I would tweak the use of the components, but luckily this was the way FreeEed already uses them. The advantage of passing through to these libraries is that the users can rely on the well-known Lucene syntax to do their searches.Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-17401101837828205092017-07-31T13:33:00.000-07:002018-05-06T19:14:49.894-07:00Couchbase at Houston Hadoop & Spark MeetupJustin Tuggle presented the well-justified reasons why today only NoSQL databases are up to par, to provide customer engagement and means for business survival, and of them, why Couchbase is to be preferred.<br />
<br />
Here is the link to the <a href="https://www.slideshare.net/elephantscale/engagement-database-hint-couchbase">materials</a>.<br />
<br />
<br />Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-74967600665633220272017-07-20T21:56:00.001-07:002017-07-21T06:03:44.073-07:00An easy way to run FreeEed on AmazonRunning FreeEed on Amazon is very easy and offers some substantial benefits.<br />
<br />
<br />
<ol>
<li>You can get a fully provisioned server in a minute</li>
<li>You can get any size of hard drive and a large number of CPU</li>
<li>It is as easy as using your desktop.</li>
</ol>
<div>
To start the server, find this AMI in the Oregon region on EC2: ami-e6acbf9f.</div>
<div>
<br /></div>
<div>
After you start the service, open the assigned IP in any browser. You will see a screen like the following below</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfPVqRiH52lwYE7CF-BJKVGyFRn4RMwbQGmOtdt3JebbqT9g1HJ5358eEK_mhgtUMORSYBT9NXHNZKP0JoBh9ww_mJoNaZuNcztFBIYqlBPjt2__bagGB9OoHjrxH9yC6_2lH8ld-ecuc/s1600/listing.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="447" data-original-width="470" height="304" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfPVqRiH52lwYE7CF-BJKVGyFRn4RMwbQGmOtdt3JebbqT9g1HJ5358eEK_mhgtUMORSYBT9NXHNZKP0JoBh9ww_mJoNaZuNcztFBIYqlBPjt2__bagGB9OoHjrxH9yC6_2lH8ld-ecuc/s320/listing.png" width="320" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Click on the 'vnc.html'. You will see the login screen</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTOsP02QXGi1rIAV2XYW0uz1hHkVSPSHs96bXc-sKZuTvbBgSEpabrgWm1aw7aPsSdd3cdJ4nIRADbqKFP75Yg-wI56InYWV_vZ_yYoKQhZAEXBsuOw0C3_qe-hpXUUOHHzMdGZ5YLH0s/s1600/vnc.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="302" data-original-width="1075" height="89" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTOsP02QXGi1rIAV2XYW0uz1hHkVSPSHs96bXc-sKZuTvbBgSEpabrgWm1aw7aPsSdd3cdJ4nIRADbqKFP75Yg-wI56InYWV_vZ_yYoKQhZAEXBsuOw0C3_qe-hpXUUOHHzMdGZ5YLH0s/s320/vnc.png" width="320" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
After you log in, you will see a full Ubuntu desktop, where you can do any work. FreeEed is already installed.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN8fMkIhvi03v0dIBwUusa9qpxc2Evc5L4_zIOld5fPR9_ihJt4aR46Dired8NKPpA4gdT9ukbe-6z9pPbIKf91usCURMsOZPRA6eoP_1qKBCwamvHX7rZ44tOFGV7aCvngpxKxxCS4sg/s1600/desk.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="881" data-original-width="1444" height="195" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN8fMkIhvi03v0dIBwUusa9qpxc2Evc5L4_zIOld5fPR9_ihJt4aR46Dired8NKPpA4gdT9ukbe-6z9pPbIKf91usCURMsOZPRA6eoP_1qKBCwamvHX7rZ44tOFGV7aCvngpxKxxCS4sg/s320/desk.png" width="320" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Enjoy!</div>
Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com1tag:blogger.com,1999:blog-3306848155597119088.post-15807596385308402862017-07-09T12:34:00.002-07:002017-07-09T12:34:38.556-07:00eDisco and Open Source Software<div style="text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCD0FA4wk7Vwr9hC-2sJlhwRarQjS-ggI5eipk3x2F2tT9hpTICKn0yQMi3vY6s3GM1teLIrEWrHxRJu-Sw2Xu4EEl3ijTXWXkS-IpVKhDA4aaeCr893_KNgalNF3mYlz0B0SN3hlsFbI/s1600/docs.jpeg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="1067" data-original-width="1600" height="133" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCD0FA4wk7Vwr9hC-2sJlhwRarQjS-ggI5eipk3x2F2tT9hpTICKn0yQMi3vY6s3GM1teLIrEWrHxRJu-Sw2Xu4EEl3ijTXWXkS-IpVKhDA4aaeCr893_KNgalNF3mYlz0B0SN3hlsFbI/s200/docs.jpeg" width="200" /></a>Today I am starting a series of blog posts on how to do eDiscovery with open source software. I will base it initially on a wonderful book "<a href="https://www.amazon.com/Project-Management-Electronic-Discovery-Introduction/dp/0997073705">Project Management in Electronic Discovery</a>". The advice that I will give will not be limited to <a href="http://freeeed.org/">FreeEed</a>, but it will draw on the complete range of Open Source, Data Science, etc.</div>
<br />
Every eDiscovery person has her or his own set of tools, and I hope that these articles will add to your library. Let's organize those docs!<br />
<br />
(Image source: Pexels.com)Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-6850218736820242762017-07-08T23:02:00.002-07:002017-07-08T23:02:59.046-07:00New use cases for FreeEedToday we release early preview of FreeEed with the following use cases<div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRSE4LA_gY94n0if6g181JX5KjdE0_pS7i85CwA1BiP0dPIVEojCNnBVJh6kPrZzlVhwG4cwF1rH8A4_n0XC4QpyjKO7w_yNzy5ibOuB3lG6egQzms3PiWgcezpLffEHmCjunZbasZHMk/s1600/pocorn.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="407" data-original-width="438" height="185" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRSE4LA_gY94n0if6g181JX5KjdE0_pS7i85CwA1BiP0dPIVEojCNnBVJh6kPrZzlVhwG4cwF1rH8A4_n0XC4QpyjKO7w_yNzy5ibOuB3lG6egQzms3PiWgcezpLffEHmCjunZbasZHMk/s200/pocorn.png" width="200" /></a></div>
<div>
<b>For the plaintiff.</b></div>
<div>
<br /></div>
<div>
If you ask for the eDiscovery documents, you might eventually get them. Now, what do you do with them? </div>
<div>
<br /></div>
<div>
The answer that FreeEed gives you is "Use the load file as the data source." That is, FreeEed allows you to load the documents you were sent and start reviewing them. </div>
<div>
<b><br /></b></div>
<div>
<b>For the researcher</b></div>
<div>
<br /></div>
<div>
Perhaps not directly related to eDiscovery, but people do you FreeEed for various research purposes. For example, at DARPA they loaded the court documents obtained from the NY Court of Appeals website and added some annotations (tags). Now, to do data analytics on the set, they need to export the documents back, with the new tags. This is provided in the option "Export the load file," which will export either the full set, with the annotations, or the current search results.</div>
<div>
<br /></div>
<div>
<b>For the techie</b></div>
<div>
<br /></div>
<div>
Sometimes your eDiscovery or other data is in the form of a JSON file. JSON format is popular because it is flexible and allows to define your fields. In fact, you can change the fields from record to record.</div>
<div>
<br /></div>
<div>
This is provided now with selecting "JSON" as an input format, with the option "Use the load file as the data source." </div>
<div>
<br /></div>
<div>
Likewise, you can import any CSV file.</div>
<div>
<b><br /></b></div>
<div>
<b>Other improvements include</b></div>
<div>
<br /></div>
<div>
<div>
* Implement extensive continuous testing with Jenkins (http://freeeed.from-tx.com:8000/)</div>
<div>
* Review - quick preview now working</div>
</div>
Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-43627480830989156952017-05-26T15:01:00.003-07:002017-05-27T22:55:16.052-07:00Sub-second SQL queries with LLAP from Hortonworks <div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh381d7RIZjbq3GxrxHPZkkWIaSzfPuwWqwIX_SeV9EDPNueB5i1cqjci728NDXEm8CifDSxgn4D1VdlUi5tSJYtarV6pz5WEyDpcY34GV-0N0QFlg01E98GVLisdzSJdQtOliuA7J5FRM/s1600/ravi.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="414" data-original-width="414" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh381d7RIZjbq3GxrxHPZkkWIaSzfPuwWqwIX_SeV9EDPNueB5i1cqjci728NDXEm8CifDSxgn4D1VdlUi5tSJYtarV6pz5WEyDpcY34GV-0N0QFlg01E98GVLisdzSJdQtOliuA7J5FRM/s200/ravi.jpg" width="200" /></a></div>
Houston Hadoop & Spark Meetup in April was graced by the presentation from Ravi Mutyala of Hortonworks. Here are the slides, <a href="https://www.slideshare.net/HadoopSummit/llap-subsecond-analytical-queries-in-hive-63959757">https://www.slideshare.net/HadoopSummit/llap-subsecond-analytical-queries-in-hive-63959757</a>. Please refer to Ravi for further questions.Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-37355331426004142902017-03-22T10:21:00.001-07:002017-03-22T10:21:37.316-07:00How to create an IntelliJ shortcut on Ubuntu<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJYr1QmYE6RuDnAVLClmp-F8giexw1zDuTn8EdPTcwFMPeUiyWBZOPQqj9QrHYwHu8jzm0xjrAY2mJ2ZxDNqnzkKsusRosEqmxpP4A9tuGlSUZ_hxW2sTmSbiAESsmaSnW2YJ9bp7f60o/s1600/ub.png" imageanchor="1" style="clear: left; display: inline !important; float: left; margin-bottom: 1em; margin-right: 1em; text-align: center;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJYr1QmYE6RuDnAVLClmp-F8giexw1zDuTn8EdPTcwFMPeUiyWBZOPQqj9QrHYwHu8jzm0xjrAY2mJ2ZxDNqnzkKsusRosEqmxpP4A9tuGlSUZ_hxW2sTmSbiAESsmaSnW2YJ9bp7f60o/s320/ub.png" width="46" /></a>I always dread IntelliJ upgrade, because I don't remember how to update my Ubuntu shortcut. So here it is, for me and other souls.<br />
<br />
1. Unzip and run from the command line, just you do all versions. No problem here.<br />
2. Choose menu Tools, then Create Desktop Entry.<br />
3. This will create an entry in ./local. Copy it to your desktop:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">cp ./.local/share/applications/jetbrains-idea.desktop ~/Desktop/</span><br />
<div>
<br /></div>
<div>
Optional: drag it to the toolbar</div>
<div>
<br /></div>
<div>
Happy traveling! <span style="background-color: white; color: #777777; font-family: Verdana, Arial, Helvetica, sans-serif, times, "Heiti TC", PMingLiU, PMingLiu-ExtB, SimSun, SimSun-ExtB, HanaMinA, HanaMinB; font-size: 18.6667px; white-space: nowrap;">逍遙遊</span></div>
Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-40918989382574850662017-03-12T16:17:00.000-07:002017-03-12T16:17:16.648-07:00What I saw in Bentonville, ARRecently I was in Bentonville, taught Big Data, and visited the Crystal Bridges museum there. I share this amazing experience in my blog post <a href="http://elephantscale.com/2017/03/teaching-big-data/">here</a>.<br />
<br />
<br />Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-11924415771817640372017-02-09T12:38:00.000-08:002020-04-26T22:18:19.964-07:00FreeEed for eDiscovery response and for general research<b>Update:</b><br />
<br />
We have re-visited loading the eDiscovery production results for review, and added loading the DAT file. This is available in version 8.1, due to be released soon. We will write another blog post and add the new instruction.<br />
<br />
Thank you<br />
<br />
<a href="http://freeeed.org/">FreeEed</a> is a popular open source eDiscovery tool. It boasts over 1,000 users, has active projects in major consulting companies and is popular with researchers. However, it often needs to be used upside down. Here is what I mean.<br />
<br />
In regular eDiscovery, you input directories, and FreeEed processes them, giving you these outputs<br />
<br />
<ol>
<li>"Load file," or a CSV file with the metadata, one line per document or email.</li>
<li>"Output file," a zip file containing native documents, extracted text, PDF images of all files, and exceptions, each in its folder.</li>
<li>Case for review, loaded into FreeEedUI review tool. It is put into SOLR as a back end, but for review, one uses the FreeEedUI.</li>
</ol>
<div>
However, there are two use cases that would require the opposite: reviewing the eDiscovery response, and using FreeEed for research.</div>
<div>
<b><br /></b></div>
<div>
<b>Reviewing the eDiscovery response</b></div>
<div>
<br /></div>
<div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoT0CP10leTdYv_V0ij-1VpKwDyIZ9hTI6bLA5DlEE01mtiGupqgVA-YpA-kSMNIZS8SrX3DhUorB4s1sxABfuPKN29X447VNybJFbeOrSKa8JeIH49jVOzODdHedVlbW9jJnhlpfXhpU/s1600/edisco-source.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="213" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoT0CP10leTdYv_V0ij-1VpKwDyIZ9hTI6bLA5DlEE01mtiGupqgVA-YpA-kSMNIZS8SrX3DhUorB4s1sxABfuPKN29X447VNybJFbeOrSKa8JeIH49jVOzODdHedVlbW9jJnhlpfXhpU/s320/edisco-source.png" width="320" /></a>If you send an eDiscovery request, you may get back the load file and the documents. In essence, you are getting the data in the same format that FreeEed outputs it. What you would like then is to reverse the process, to make the load file the input, and to index the documents for search. This is now implemented in FreeEed.</div>
<div>
<br /></div>
<div>
When you select the input, you see a "Data Source" panel. If you choose eDiscovery, FreeEed will work as before, that is, accepting your custodians' files as input.</div>
<div>
<br /></div>
<div>
If you choose the "Load file" radio button as a data source, the program will do the following</div>
<div>
<ul>
<li>Read each line of the load file</li>
<li>For each line, use the given fields as metadata</li>
<li>Make the metadata and the extracted file text searchable and create a case in FreeEed for review</li>
<li>Available in FreeEed V 7.3</li>
</ul>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3HV0JbffZrTTmAiW4ccRNYfaOmRZb8xjJeNBXHzWfLJXzEWIc7qUuaGbnppqihk9EHp-vgtZoPU2urTEXRTTUDXlEzJLUjrbXN8dX2DT9GplSBXmjJMThBBFzeXOAAZPktG-P8ubcZQE/s1600/load-file-input.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="211" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3HV0JbffZrTTmAiW4ccRNYfaOmRZb8xjJeNBXHzWfLJXzEWIc7qUuaGbnppqihk9EHp-vgtZoPU2urTEXRTTUDXlEzJLUjrbXN8dX2DT9GplSBXmjJMThBBFzeXOAAZPktG-P8ubcZQE/s320/load-file-input.png" width="320" /></a></div>
This use case lends itself very nicely to parallelization, and can, therefore, be processed on a Hadoop cluster, to accommodate large volumes.</div>
<div>
<br /></div>
<div>
<b>Using FreeEed as a research tool</b></div>
<div>
<br /></div>
<div>
Often, researchers already have the metadata extracted. For example, in our <a href="http://shmsoft.blogspot.com/2016/12/using-freeeed-in-memex-program-for.html">Memex court document investigation,</a> we already have elaborate parsing code that extracts metadata from the court documents. In this case, we want to be able to load the metadata and the file text into FreeEedUI for research. We should be able to answer questions like</div>
<div>
<ul>
<li>How many times was a given crime mentioned?</li>
<li>Repeat the question above for the particular judge and in a specific time range (this questions will search metadata in a structured way, as well as text).</li>
</ul>
<div>
Clearly, this is the same use case as above. The only difference is that we need a different set of metadata fields than the one used in FreeEed by default. Technically, this amounts to programmatically changing the schema in SOLR, and this will be done in the next update, V 7.4.</div>
</div>
<div>
<br /></div>
</div>
<div>
</div>
Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-39464163868126134092017-01-23T09:07:00.000-08:002017-01-23T09:09:11.814-08:00Healthcare and Machine Learning<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdKqehHDdGabe_LwqYehQXZc1SdmulZV2TrKGvq0gCgu0r5PjWASIfAkS4DIhKYkFxQ3TziPu4iXOnnWyE5uhGEArB3QDT-NPwonzd251JBVF85P2BdKPTO9wCn6QWcO3gNNuLiz_nkJI/s1600/Hadoop-in-healthcare.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="172" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdKqehHDdGabe_LwqYehQXZc1SdmulZV2TrKGvq0gCgu0r5PjWASIfAkS4DIhKYkFxQ3TziPu4iXOnnWyE5uhGEArB3QDT-NPwonzd251JBVF85P2BdKPTO9wCn6QWcO3gNNuLiz_nkJI/s200/Hadoop-in-healthcare.png" width="200" /></a></div>
I have written about the current state of Machine Learning in Healthcare and about the practical steps that the healthcare professionals can take today.<br />
<br />
The major points are<br />
<ul>
<li>Quick Overview of Machine Learning</li>
<li>What can Machine Learning do for healthcare - overview of current use cases</li>
<li>What steps one can take today while waiting for big developments to come through</li>
</ul>
<div>
<br /></div>
<div>
The blog post is on the Elephant Scale blog, http://elephantscale.com/2017/01/healthcare-machine-learning-practical-approach/ so you can continue there.</div>
Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0tag:blogger.com,1999:blog-3306848155597119088.post-26710516602883875712016-12-29T20:42:00.001-08:002016-12-29T20:47:19.527-08:00Using FreeEed in the Memex program for investigations<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_ui2uJM85nypK80A3WRBhqESvc0d18SCuNr6klBU5x17dtdLaSiIQKh5x5t6zRXkKxYSQF8TVXYcQCk4uxfuyLjB0VfSLDXEnGvQYVgQeC6bXm1a_xtN7LZV4VlX9NaFP-gDV26G2cpY/s1600/1.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="217" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_ui2uJM85nypK80A3WRBhqESvc0d18SCuNr6klBU5x17dtdLaSiIQKh5x5t6zRXkKxYSQF8TVXYcQCk4uxfuyLjB0VfSLDXEnGvQYVgQeC6bXm1a_xtN7LZV4VlX9NaFP-gDV26G2cpY/s320/1.png" width="320" /></a>A common problem in investigations is that the authors of the research software, which is being produced in the course of the Memex problem, are themselves not authorized to see the data that the investigation agencies deal with.<br />
<br />
To address this problem, we added hash search to FreeEed. First, we have added the metadata screen display (which was not previously available), and users can see the metadata.<br />
<br />
This screenshot presents the view of the metadata table. Metadata, of course, is "data about data." It shows all the fields collected from the documents being searched, together with their "a.k.a" or synonyms. For example, in this screenshot, you can see that field 22 can be called "From, but it can also be called "Author" or "Message-From." You can see now that there is a new field, called "Hash."<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbZQt9DOmxqMEa8lAVslWvNIch5lIrl9I-w4RjKqgmeg-5FkbeO3DyL65xwHiOCXw3BPx48m4iU0hJnyWeRIId-9okfp6VOD2HlvW_8E8zPzYTAJt0ea9vO4Q8rZeyAj110Vx-33RcgcI/s1600/hash.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="182" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbZQt9DOmxqMEa8lAVslWvNIch5lIrl9I-w4RjKqgmeg-5FkbeO3DyL65xwHiOCXw3BPx48m4iU0hJnyWeRIId-9okfp6VOD2HlvW_8E8zPzYTAJt0ea9vO4Q8rZeyAj110Vx-33RcgcI/s320/hash.png" width="320" /></a><br />
Next, the file hash is added to the metadata fields settings. Users have requested this feature prior, and now it is available. For emails, the hash is defined using the popular email fields. In FreeEed, this is configurable through the database.<br />
<br />
This hash is shown in the screenshot on the left, which represents the 'load file' output by FreeEed. There it is seen with other popular metadata fields, which were recently added by request, such as Message-ID.<br />
<br />
The investigating agency can simply compute the hashes of the objects, such as texts, phones, images, or anything else that they are looking for, and search for these, without revealing what they are searching for, to the authors of the software or the processors. Entities other than investigating agencies may find this feature useful as well.<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikVjkQCLtHb7w_BIeyefxwEDas9dPnV2cNNul9PDjZTQ1mcTQsZmVGOiLWHgVdcpgKcCA1SI9ed-PHspejZqAlmjzBrpmnVzz18VHB9iCXThRXHp4xoAuamBIusXRDc9Tfe0FngqEKsAk/s1600/hashthere.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="122" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikVjkQCLtHb7w_BIeyefxwEDas9dPnV2cNNul9PDjZTQ1mcTQsZmVGOiLWHgVdcpgKcCA1SI9ed-PHspejZqAlmjzBrpmnVzz18VHB9iCXThRXHp4xoAuamBIusXRDc9Tfe0FngqEKsAk/s320/hashthere.png" width="320" /></a><br />
Now, this shows in the processing results but is it searchable? For that, Hash has been added to the schema in the FreeEedUI search engine (which is SOLR). Now Hash shows up as one of the fields for each document, as the screenshot shows.<br />
<br />
The last question, can one search having just the hash value? The answer is yes, you can search on the hash alone. To verify this, pick up one of the hashes that you saw in the documents and try to search for this value. You will find this one document - as is to be expected, since all hashes, MD5 and SHA-1, are designed to be unique per document. The last screenshot illustrates this.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5pnfxBGyAlSqauG9kUW_0jqBcsUeFcuEM1FSLibsU0eKBDb6urBH1GmdIhwW_t7_XEshkcLP8G2GuTR8xiRClyM2vhi10xHEjcRiwM6s-bAFvTGySRV5E7APoTZuP_gZWeRmBoBUeUOg/s1600/onehash.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="193" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5pnfxBGyAlSqauG9kUW_0jqBcsUeFcuEM1FSLibsU0eKBDb6urBH1GmdIhwW_t7_XEshkcLP8G2GuTR8xiRClyM2vhi10xHEjcRiwM6s-bAFvTGySRV5E7APoTZuP_gZWeRmBoBUeUOg/s320/onehash.png" width="320" /></a></div>
<br />
Additionally, FreeEed can provide the results sorted by user-defined "document significance," using the user-provided functions. Such functions are supplied by the Memex groups.<br />
<br />
<br />Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com1tag:blogger.com,1999:blog-3306848155597119088.post-12892831469971780512016-12-25T21:25:00.002-08:002016-12-25T21:27:57.253-08:00Word clouds in FreeEedWord clouds have been added to FreeEed as an early release. To try, download the jar from here, <a href="https://s3.amazonaws.com/shmsoft/releases/freeeed-processing-1.0-SNAPSHOT-jar-with-dependencies.jar" style="background-color: white; color: #004b91; font-family: "helvetica neue", sans-serif; font-size: 12px; text-decoration: none;" title="https://s3.amazonaws.com/shmsoft/releases/freeeed-processing-1.0-SNAPSHOT-jar-with-dependencies.jar">https://s3.amazonaws.com/shmsoft/releases/freeeed-processing-1.0-SNAPSHOT-jar-with-dependencies.jar</a> and replace the jar by the same name in your install. Then run freeeed_player.sh (.bat) as usual.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifVxdUs8rcQNeY2IoYGh_ha28fB0hOOOBtWvxDrGl5iaARjf8E2AKnCuI3ISFXjv3FcSH5uEPa2O1Ehg1Z3FWhjZ8FYmO0KK7kRNUMEITigRUodUfGzScWsybQM6MLAy7MwNrBcRMnYp4/s1600/wordcloud.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifVxdUs8rcQNeY2IoYGh_ha28fB0hOOOBtWvxDrGl5iaARjf8E2AKnCuI3ISFXjv3FcSH5uEPa2O1Ehg1Z3FWhjZ8FYmO0KK7kRNUMEITigRUodUfGzScWsybQM6MLAy7MwNrBcRMnYp4/s320/wordcloud.png" width="320" /></a></div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKr6YElhwm86sB7G-Fpgz-REVoUoByga-hSMhbBgQ0BoTm8NbuCjjTZ5SJgrOmkFHQJruM0dZzZLH8k5bjve2M0HusgWTgH7u_BSvjfVUZKrdXRADrhYk6Dsex7m0IKBOn6zBNQ0rMbH8/s1600/wordmap.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="173" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKr6YElhwm86sB7G-Fpgz-REVoUoByga-hSMhbBgQ0BoTm8NbuCjjTZ5SJgrOmkFHQJruM0dZzZLH8k5bjve2M0HusgWTgH7u_BSvjfVUZKrdXRADrhYk6Dsex7m0IKBOn6zBNQ0rMbH8/s320/wordmap.png" width="320" /></a>Here is an example of a word cloud and a screenshot of the Analytics menu, which features word clouds.<br />
<br />
The word cloud is from project included with FreeEed, which is just a collection of unconnected documents, so the cloud is not very meaningful. You should get something related to your use cases and more useful.<br />
<br />
Your feedback will be very much appreciated.Mark Kerznerhttp://www.blogger.com/profile/13141058882531144922noreply@blogger.com0