Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I accidentally started a movement – Policing the Police by scraping court data
650 points by kristintynski on Sept 19, 2022 | hide | past | favorite | 183 comments
Almost 3 years ago, I posted a story of how a post I wrote about utilizing county level police data to "police the police" to r/privacy and hackernews. https://old.reddit.com/r/privacy/comments/gr11aw/i_think_i_a...

The idea quickly evolved into a real goal, to make good on the promise of free and open policing data. By freeing policing data from antiquated and difficult-to-access county data systems, and compiling that data in a rigorous way, we could create a valuable new tool to level the playing field and help provide community oversight of police behavior and activity.

In the almost 3 years since the first post, something amazing has happened.

The idea turned into something real. Something called The Police Data Accessibility Project. (https://www.pdap.io)

More than 2,000 people joined the initial community, and while those numbers dwindled after the initial excitement, a core group of highly committed and passionate folks remained. In these 3 years, this team has worked incredibly hard to lay the groundwork necessary to enable us to realistically accomplish the monumental data collection task ahead of us.

Let me tell you a bit about what the team has accomplished in these 3 years.

Established the community and identified volunteer leaders who were willing and able to assume consistent responsibility.

-Gained a pro-bono law firm to assist us in navigating the legal waters. Arnold + Porter is our pro-bono law firm.

-Arnold + Porter helped us to establish as a legal entity and apply for 501c3 status

-501c3 status granted

-We've carefully defined our goals and set a clear roadmap for the future

-Hired first full-time staff.

-PDAP was awarded a $250,000 grant by The Heinz Endowments

So now, I'm asking for help, because scraping, cleaning, and validating 18,000 police departments is no easy task.

The first is to join us and help the team. Perhaps you joined initially, realized we weren't organized yet, and left? Now is the time to come back. Or, maybe you are just hearing of it now. Either way, the more people we have working on this, the faster we can get this done. Those with scraping experience are especially needed. The second is to either donate, or help us spread the message. The more donations, the more data we can gather. I want to thank the r/privacy community especially. It was here that things really began.

TL;DR: I accidentally started a movement from a blog post I wrote about policing the police with data. The movement turned into something real because of r/privacy and hackernews: (Police Data Accessibility Project). 3 years later, the groundwork has been laid, non-profit established, full-time staff hired, and $250,000 in grant money and donations so far!

Scrapers so far Github https://github.com/Police-Data-Accessibility-Project/Scraper... Discord if you would like to join the efforts: https://discord.com/invite/wMqex8nKZJ

*This is US centric



Really love the idea, and the passion behind it. Def could have legs.

Here’s the pitfalls I see you falling into:

(1) seriously, what data are you collecting? “Everything” isn’t a great answer (who’s supposed to use ‘everything’, anyway? “Anyone”?). “Apples-to-apples police misconduct statistics” is a good one.

(2) it’s important to clarify 1 because you need to know who you’re serving, and why. Different activists need different data. “Have all data” sounds good until you need to decide how to allocate your resources.

(3) more deeply, data is the land of edge cases. Even just with police misconduct, you need to get DEEP to rigorously compare seemingly-simple stats like “# of unjustified police killings”. If you don’t start narrow, you’ll never show value. If you don’t show value, nobody will ever care you exist.

When I look at the data you’ve collected, it ranges from annual reports, to municipal contact info, to crime stats. What’s important to collect at scale? To whom? What do they need it for?

Again - great, ambitious idea! But $250k goes fast. Show value before it runs out!


Thanks for the thoughtful response! This is really helpful.

1. Agreed. Our strategy for this isn’t clear on the website, I guess, but we do have one. It’s to focus on depth in geographic areas. This is because context is critical, and because most of the users we talk to are operating locally with municipal or county level data. So it’s more important to have every data source we can possibly find relevant to Pittsburgh than it is to have every arrest record in every municipality. Or at least, it’s more immediately useful to people.

That said, most people seem to contribute data sources from where they live. I think little microcosms will spring up where people take stewardship of maintaining information about their chosen geo or subject areas. Not too far down the roadmap, Milestone 2 for the PDAP heads.

2. I will take it as a next step to make this strategy clear and say why. We want to basically allow the community to make its own to do list: what kind of question are you trying to answer? That creates a “bounty” for data which can be fulfilled by an altruistic volunteer, another member of your team, etc.

3. Yes. We’re not trying to do apples to apples comparisons of departments yet, partly because it’s so absurdly difficult and you don’t know where to start. Why would you undertake a 12 hour research project to compare St. Louis and Minneapolis incident reports if you don’t have a use case? Instead we’re focusing on what we DO know we need: complete local data, town by town / county by county.

The data we collected reflects the nature of our early experiments, which were scattered. This airtable prototype is maybe 2 weeks old, next up is helping people understand where to focus.

The idea for demonstrating value is also local. I’m working with groups in Pittsburgh (where we are based, and where our funding came from) to make ourselves indispensable to them. I’m hoping to turn the $250k into a handful of killer local case studies in this year, rather than marking 0.1% progress toward a national vision.

Thanks again for giving me the practice explaining this stuff. I hope I’m making any kind of sense, and of course happy to hear where I’m still wrong.


For number 1, I would look for scenarios where rhe officer was found to have committed misconduct or found to be unreliable. Then watch if they're involved in subsequent cases/departments when should probably never work as an officer again. Just my thoughts on one thing that could be done.


I'm not in the U.S so I might be ignorant, but wouldn't it be fair to collect positive data too, and not just misconducts?


Yep, almost all the data is useful for one reason or another. The point is to paint a picture of police activity in general.


What would that look like?


Uhh like how many murder cases got solved, how many drunk drivers had their driving licenses revoked, how many speeding tickets went through, how many stolen cars and goods found its original owner, how many pickpocket thieves, shoplifters, bulglar cases got investigated, you know ... Stuff police exist for?


Those stats are typically at the department level. They don't necessarily match the court records either (eg the police may mark a murder case as solved even if the prosecutor thinks the case is too weak and never goes to court).


You might try defining what the "ideal" department's data would look like: what categories of data, what columns each record has, what the values are for each, etc. Ideally you'd stamp it with a year and give it a spiffy name so it could be the National Police Data Reporting Standard 2022 (NPDRS.2022) or something.

Departments that are trying to be transparent (or who just don't want to deal with figuring it all out from scratch) may be happy to adopt something considered a "standard" for tracking and reporting data. In some cases it means it is a checkbox they can check without having to deal with annoying people and their annoying questions... but that hardly matters so long as the data is made available. It would also give companies developing software for police departments a target to aim for.


Working forwards like this is definitely the right solution if it's achievable. I have worked with some government and police datasets, and they reflect that the records-keeping approach is very much designed with the old-world use case of manually reviewing individual records. For example, a record of a traffic collision would be perfectly fine if you wanted to go back see what happened in a specific collision. However, if you wanted to run an analysis over a set of collision records, you would run into problems like vehicle types being specified as 'free text' (anything can be entered), with no standard set of vehicle classifications (like an enumeration).


“working forwards” is a good phrase. Thanks for this whole comment!


Yes, I've linked to it other places on the page but Measures for Justice is doing that. https://measuresforjustice.org/infrastructure


this is a great idea. activists could then use the ideal standard & pressure departments into achieving "NPDRS Transparency" or something


This is important. Locally, we had a sheriff who was being heavily, heavily criticized due to several deaths at the county facility. This was at the height of the protests a few years ago.

It was a lot of work to find data on policing nationwide, because the question really was "Is the sheriff doing a bad job, or do bad things happen sometimes?"

After some hard work trying to identify other cities with similar socioeconomic circumstances and populations, it became clear that our local sheriff was actually better than average, and that much of the outrage was fabricated.

That's also when I learned that many people don't want to listen to statistics unless they agree with their own preconceptions.


> That's also when I learned that many people don't want to listen to statistics unless they agree with their own preconceptions.

This has been my experience with bodycam footage, I've found that there's been quite a few heavily protested police involved shootings that when looking over the footage and the facts of the situation, were by the book and completely justified, yet no matter how many times you say to someone "you do know there's footage of the entire event, uncut and unfiltered", it doesn't seem to matter.

EDIT: I just remembered what my throwaway username is.


The PoliceActivity youtube channel is interesting for this.

https://www.youtube.com/c/PoliceActivity/videos (For most of the videos, which show people dying, you must be logged into a youtube account to pass the "violent content" prompt)

Maybe 9 out of 10 videos are "good kills". Officer says something like "hey, stop committing a crime", the suspect says something along the lines of "fuck you pig!", pulls a gun, and is swiftly shot to death by the 50 SWAT guys surrounding him. These videos get around 100K views, and the comments are full of hooting MAGA types.

1 out of 10 show an officer doing something completely inexcusable, (Shouting "I love being racist!" and then hitting an infant with a baseball bat) and the video has 100M views.

The tens of thousands of comments then act out the exact same conversation every single time: the red team says something like "but it happens very rarely" and the blue team is outraged that it happens at all. They don't want the rate to be low, they want it to be zero. This is how you end up with blue-aligned media earnestly and in good faith calling for the police and all prisons to be abolished: https://www.nbcnews.com/think/opinion/abolishing-police-pris... https://www.newyorker.com/news/our-columnists/the-emerging-m...

I say "conversation" up there, which is of course incorrect. They're talking right past each other, since they come from two separate and completely divorced epistemological universes.


The problem with those ostensibly rare "bad cops" is that the supposed "good cops" tend to look the other way at best, and tacitly support them at worst. Cops who actually try to call their colleagues out tend to be heckled or worse. Here's a documented case of an entire police department ganging up against one whistleblower to the point of involuntarily committing him to a mental institution, with full knowledge, approval, and cooperation of higher-ups all the way to police commissioner.

https://en.wikipedia.org/wiki/Adrian_Schoolcraft


> They don't want the rate to be low, they want it to be zero

I'm not sure what the problem is with that. Of course mistakes happen but it should be the goal.

The red team also seems to have less of those "oopsie I just killed you" moments, so perhaps that might color things there.

And then you go on to the far end that calls for abolishing police/prisons and make that the general team blue zeitgeist. There's low hanging fruit of reform that needs more focus, e.g., not using police for mental health emergencies.

Anyway, your framing of the situation as a non-conversation is kind of meta -- it feels like you talked right past the problems.


Because when you have your target as zero then when you don't hit zero you push for ever more costly and suboptimal policy to achieve your goal. This is a problem both in that the incremental policy changes are highly unlikely to provide a larger benefit than their incremental costs and because it draws resources away from tackling problems that probably have a larger societal impact and smaller costs to achieve those goals.


That makes no sense and I'd be interested in examples supporting it that are not some person on the fringe shouting opinions.

What should the goal of the police department regarding killing innocent people? Should they aim for a dozen per quarter? Per-capita weighted?

Incremental changes have been hard because it gets down to the police policing themselves, and that has shown to be a failure.

Again, one of the "simplest" reforms would be to have other professionals deal with mental health crises.

A bigger reform with much more value would be ending the War on Drugs but that's way higher up the food chain. It is germane though, in that the whole point of said war was to give police more opportunities to oppress "others" (i.e., "minorities" and "hippies").


That you think it makes no sense means you do not understand statistics or cost/benefit analysis. There are tons of examples. People killed by drunk drivers, pedestrian road deaths, illegal drug usage in society, etc. Innocent people killed by cops is a bad measure because, by definition, pretty much all the people cops kill are innocent because almost all of them have not been found guilty by a jury of their peers for whatever the cops are trying to apprehend them for and are by definition innocent. The target for cops proven to have violated relevant policy related directly to the killing of a person should be zero but there is a non zero number of people killed by police each year society should be fine with. Even the target I said should be zero above should only be zero if the regulations pertaining to that are clear and concise enough to be easily understood and adhered to by police, otherwise situations where lethal force are justified will get tripped up on marginally relevant grounds because the probability the average police officer fails to follow some procedure increases with the number of procedures required.

All of these things have a pattern in common where the majority of the improvement can be had reasonably cheaply and capturing the last small improvement has huge costs that are usually much more costly than the benefit of eliminating the last bit and the population of potential events is huge. Road deaths are an illustrative example. Adding speed limits, traffic lights, crosswalks, moderate enforcement, etc. all dropped road deaths precipitously in the USA. If the USA changes its model from cars first to people first there are still some moderately costly infrastructure changes they could make to get their numbers down to Netherlands levels, which are still not zero. Beyond that you are left with hugely costly heavy enforcement among steadily more dystopian and invasive methods that will likely improve the number but not get it all the way to zero. It is really unlikley that these last improvements cost less than the benefit they incur between the direct cost of implementation and the negative effects on the total population of driving events.

On top of all this all those large costs you are incurring for very small gains come at the expense of being able to spend those limited resources on other things that likely have better cost to improvement ratios. You are foregoing a larger decrease in something else bad for society for small decreases in your target. In the road deaths example above that final small drop in road deaths might be coming at the cost of large drops in crime if you deployed those police there instead of on road deaths, as an example.


> That you think it makes no sense means you do not understand statistics or cost/benefit analysis

Maybe your writing was not clear enough, m'kay?

And now you play word games with my intent. I did not advocate for extreme measures -- simply to have clear targets of and rules of engagement; to identify workable solutions.


No, I am telling you what the problem is with having the rate set at zero, which is what I have been doing the entire time. Using your terminology, you don't get extreme measures initially, you walk yourself to extreme measures because everything you change does not hit your goal rather than just calling it a day when you hit the inflection point where the costs of your policy are roughly equal to the benefits of your policy and there are no obviously better uses of your limited resources that you should be doing instead.


I don't think you're willfully misconstruing me, but you are nonetheless.

You're manufacturing slippery slope reductions of a concept and therefore dismissing the concept.

What should the goal of police departments of pet dogs killed by cops? Should they aim for 10,000? A million? Just a handful? A logical answer is "we want to avoid killing any pet dogs in the course of service" (also known as zero).

Now they could take absurd actions to avoid that (leaving the scene whenever a dog is present), or they could add that to standardized training so that they're better prepared to deal with that situation.

So without providing real-world examples of you concerns you are just offering up florid conjecture.


Guy shouting fuck you pig while surrounded by 50 SWAT guys sounds like a mental health emergency. Can't do much about that. Ain't called first respondersfor nothing.


I mean, you can, but everyone involved has already ruled out doing any kind of root cause analysis because they don't like the obvious solutions to the causes, so both "gun" and "mental health emergency" and "shooting by police" are treated as natural events like weather that we cannot hope to understand or control.


>sounds like a mental health emergency

Sure, you can dose him up with Haldol and solve that particular emergency. What happens when he does it again next week? And the week after that?

Persistent severe mental illness is interesting in that it's a new problem. For all of human history, severe mental illness was a death sentence. Either you'd get beaten to death by your largest neighbor for acting weird, or your village would exile you and you'd starve to death in the forest.

The current compromise we've arrived at is that being crazy is not actually a crime, so you cannot be permanently imprisoned for it. So low-functioning schizophrenics cycle through mental hospitals-- they climb a lamp post and urinate on passers-by, get committed and then are put on antipsychotics, sober up, survey their surroundings and rationally conclude that inpatient mental health facilities are really awful places to be, (basically no rights or privacy, surrounded by crazy people, can't wear or handle any object that's on a very short list of approved suicide-incompatible things) and then check themselves out, which they are legally allowed to do. Then they go off their meds, (also legally allowed) and it's back to the lamp post...

This continues until they accidentally walk into traffic, or are shot by a cop. It's a miserable compromise that persists because all the alternatives are even less palatable.


Follow up response -- the meth epidemic seems to be manufacturing craziness at a rapid clip. Our current "solution" is failing us and we'd be well-served to reexamine how to deal with it.

The correct answer is to legalize, tax, and regulate all drugs and to create "mental health courts" that can administer the involuntary institutionalization of those that need it.


We need to take another try at involuntary institutionalization of the mentally ill.

It should be cheaper than prisons and more humane as well. The homeless crisis begs for this.

Just got to work out trivial details like how to not have that power be abused...


Second comment because I was responding without addressing your point properly: somebody in a crisis like that could possibly be helped if there weren't 50 SWAT guys eager to take target practice.

There's obviously violent scenarios that can only be addressed with violence, but bringing that on should be a last resort.


Putting someone already in crisis into a dangerous situation (surrounding them with SWAT) is going to make the crisis much worse. The film we see of people having a mental health crisis always comes from after the cops have shown up; they may have been much less agitated beforehand.


Yep. There are "simple" things we can do that will make things better but that can't happen until we agree to do it.

And for that to happen we need to understand it.

And for that to happen we need to talk about it.


Yeah, suicide by cop is a thing.

My point was that the comment about divisiveness between teams red and blue in this regard was divisive as well in how it was framed.

Police are granted incredible powers and have historically not had the best oversight, let alone relations with "minorities".

There are real problems and they can be addressed but that will never happen as long as we can't even agree that the "what" exists, let alone if the "how" is correct.


The problem is the ""good"" kills should also be regarded as failures of the system in a lot of cases.

> They don't want the rate to be low, they want it to be zero

Well, yes. "Thou shalt not kill" does not have an "unless you're a cop" footnote. There is no acceptable number of your children you would tolerate the shooting of.


> They don't want the rate to be low, they want it to be zero

As someone that cannot even vote in the US (and is therefore neither red nor blue) - that seems _eminently fucking reasonable_. This kind of thing just does not happen in the UK.

I'd wager there have been fewer police killings (let along just shootings) in the past two decades in the UK than just this weekend in the US.

For anyone that actually believes in small, limited government, the idea that government agents can wander around shooting effectively at will is so ridiculous that anyone claiming to be "red team" should hang their heads in shame.


>This kind of thing just does not happen in the UK.

https://www.independent.co.uk/news/uk/home-news/chris-kaba-p...

>Chris Kaba: Protests held across UK after unarmed black man shot dead by police. Crowd of more than 1,000 demonstrators brandished ‘fight racism’ signs following the death of unarmed 24-year-old


Fewer in the last two decades than this weekend in the US.


What happened to "just does not happen"?


That's interesting -- but it certainly conflicts with macro-view analysis of the data[0][1][2] (I could link more, but there are more research links in the first linked thread and I do not wish to waste your most valuable time).

It seems to me that the only way to explain this is the lack of publishing of damaging videos by the youtube account -- i.e. the usual thing you need to pay attention to in the social sciences -- a) who chooses to release the data, b) the conditions under which they choose to release the data.

On the side of the YouTube channel -- based on the patreon channel, it seems to me that they are most likely not part of the police, or the judicial system. Thus they must obtain body camera footage from freedom of information requests.

This means that that information request is subject to filtering on the side of law enforcement, who can almost arbitrarily choose what videos they want to release -- while there are guidelines present, they really only apply if you can prove them being broken, which... due to the nature of FoI requests... you can't...

So, just as a baseline, legal advice holds that data that is currently being pursued legally should be held off from release to a general audience, and court cases can take many years to be processed. So it seems to me that that is one main reason that videos that show misconduct by police offers would be failed to be released, or would otherwise be redacted.

On top of that, you have whether officers themselves working in the police administration care to release the footage. It seems reasonable that an officer may be subject to workplace-based social pressure, and not wish to release footage of wrongdoing by one of his coworkers, it also seems reasonable that in some cases, they might indeed feel departmental pressure to not release footage that displays such wrongdoing, so that the department as a whole does not come under flak. You have cases in the UK where officers themselves deleted videos that would prove wrongdoing on their part[3]. Either way, this is inherently impossible to prove either way.

And then you have whether or not the police officer was recording at all, regardless of what regulations state. There have been a few cases recently where police officers thought that their body camera was off, and used that time to break the law[4][5]. Indeed, in some states, it's entirely up to the officers whether to turn them on in the first place, based on what they consider as "an incident".

And then finally, as a youtube channel accepting donations, they are heavily incentivized to draw "engagement" and game the algorithm, so what they release is not just going to be based both on the political opinions of those within the organization, but also will heavily cater to whatever established audience they have, to ensure that each video is liked and that they gain subscribers, so they can drive donations to keep on doing what they are doing.

So to me it seems that this isn't as nearly cut-and-dry as you assume to think that it is. At the very least, a random youtube channel that releases police video, cannot be thought of as a proper or correct sample from which to draw correctly proportioned information from -- as we can see, there are many reasons why it would misrepresent the number of cases of each involved. While research in this area is perhaps uncomfortable for people to accept, it broadly shows that police -- at least how they behave at the moment -- are universally flawed. I myself would prefer to trust the data.

[0]: https://twitter.com/equalityAlec/status/1571898316295643136

[1]: https://resistancelab.network/our-work/taser-report/index.ht... (Disclaimer: I know some of the people who worked on this. Interesting Note: 118 cases of taser use against children, 8 against children aged under 11)

[2]: https://www.theguardian.com/us-news/2022/jul/28/hunted-one-i...

[3]: https://www.policeconduct.gov.uk/sites/default/files/Documen...

[4]: https://www.thedailybeast.com/baltimore-cops-turned-off-body... (I admit this is rather a famous case!)

[5]: https://www.miamiherald.com/news/local/community/broward/art...


It's the same sort of thing with body cameras. If anything they capture a lot more context about situations. Generally I think the police will start to want to have the safety of the record keeping rather than not.


I sort of agree, but one thing the last few years showed me was how little people (especially in the media and parts of the academy) care about the truth. overwhelming, easily accessible, incontrovertible evidence might save you from the law, but the media, activists, and other partisans are still happy to make your life hell.


Police are already notorious for turning off their body cameras when it's convenient for them. Some police shootings result in no bodycam footage at all even though most or all officers were wearing one.


Tougher to answer, but maybe more useful, would be “What harm reduction strategies are being tried in other cities? Are they working?” this is at the intersection of policy and outcome and takes a lot of context.


Thank you for your work with this! One question I have:

You say in your FAQ "We aren't a watchdog—our activism is data collection and accessibility, not analysis or research."

Can you note any instances of other people using your data for analysis or research?


We're still developing those relationships, and we haven't generated any novel data that is deeper than web URLs. I'm based in Pittsburgh so we're still working with local journalists, activists, etc. to understand how they use the data and how we can help.


Just a heads up that your comments are getting marked as dead (shadow banned). You seem to have fallen afoul of some HN spam trap for new accounts.


Post like this are interesting because as an idea you would think that HN would the best target. Even if no one here provides a a single character of code they can provide insight Into pitfalls and experiences they’ve run into when doing this sort of thing. I hope the comment section are fortuitous in advice.


Yep! It's really helpful to see where people find problems and want to jump in.


Hello! I'm the executive director. I have a design background, have done product management in the past, and aside from keeping the lights on at PDAP and making sure we're tax-compliant I am in a product role. I talk to people using police data, and figure out where we can add value to make the data more accessible.

TL:DR; If you want to write scrapers: go for it! Run your scraper, share the results in Discord and with your friends, and talk about the process. We'll be listening, and it will help us build tools to support this important work.

A few things to clarify:

a. The source of truth for "what are we doing right now" and "how can I contribute" is https://docs.pdap.io/.

b. Empowering people who write scrapers is a part of our broad mission of "police data accessibility", but we have some foundational work to do first! Right now our primary project is creating a database of police agencies and data sources. This will help people understand what kinds of data are available, at which agencies, with which steps to access it. It will also help us create archives of the primary sources, so that if they get taken offline we can still go back and scrape them.

c. What we have realized in the past few years: there are already a ton of people writing and using web scrapers for their day to day work. They are as decentralized as our police system. Our scrapers repo will reflect that. We shouldn't all rely on one library, or even one language. The people who need the data are most motivated to maintain scrapers, and we expect that maintenance will be ad-hoc and as-needed for the immediate future. In most cases, data already published on the internet is useful to local users as-is.

d. If you have a question you'd like to answer about the police, here's the investigation process:

1. Determine whether public data exists to answer your question. Use google to find the appropriate agency, and see what they're publishing. 2. Determine how it can be accessed; do you need to make a FOIA request? Is there a URL? 3. If there's a URL, determine whether you need to write a scraper to access the records. Often, the records can simply be downloaded. 4. Write and run a scraper, if you need one! 5. If there's not a URL, make a records request for the public information. This is a long and complicated process. 6. Share the data with your friends.

This means that scrapers are helpful and necessary some of the time; but not always, and not as the first step. We're trying to help with steps 1, 2, 3, 5, and 6. The theory is that writing scrapers is something people can easily slot in and help with; and that, depending on what question you're trying to answer, two scrapers for the same data source might look wildly different.

Scrapers are an important part of the ecosystem, but they're one piece of the puzzle.


Josh, many of your comments here are displaying as "dead." The HN FAQ[0] says:

> What does [dead] mean?

> The post was killed by software, user flags, or moderators. Dead posts aren't displayed by default.

I suspect that it is a false positive. Maybe email the mods for help? hn@ycombinator.com

[0] https://news.ycombinator.com/newsfaq.html


Thanks! I contacted them and I've been reinstated.


Not sure why this got killed (dead in HN terms) but I vouched for it.


Over half this person's comments (the Josh-pdap person I mean) have been flagged.

Not sure if this is some false positive on anti-spam (new account, lots of comments in a short time on a single story) or if it's someone with a vendetta. @dang this is a little weird...

(I know, this site doesn't have magic @, but also i think the site moderator probably has scripts searching for mentions/summons anyway. I would anyway...)


The comments weren't flagged. HN's software has other kinds of filters too, and comments by brand new accounts are treated a little more skeptically. Fortunately users had vouched* for all but two of josh-pdap's comments by the time I saw them. I restored the other two and marked the account legit.

* https://news.ycombinator.com/newsfaq.html


Yeah, all of the new comments from the account are dead. I think creating a new account, posting an article, and posting a handful of comments looks a lot like spam to HN. I've vouched for the comments, but would definitely recommend emailing the admins.

Josh, I can't reply to you directly, but yeah... you can see the comments, but most people can't, unless they have enabled "showdead" in their preferences.


Thanks! It wasn't me who posted the article, but I agree with the theory that it's likely some well-meaning "new account spam prevention" thing.


Half of josh's comments show up as dead, so maybe you and others who can, look around and vouch for them as well. There's something nefarious going on...


What do you mean? I can still see it, but you can't?


Thank you.


a while back I created www.bartcrimes.com to publish police reports which were intentionally hidden behind a mailing list you must get approved to be a member of. Turns out, the public loves this kind of thing.


That's cool but why are the most recent entries from Sept 2021? Did BART do something even more effective to stop these updates from getting out?


I love that you wrote it on BART. I spent my year of BART time solving chess puzzles.

"Making public information public" is a good tagline too.

Do you know what kinds of work people did with the data? It seems to me one of the best ways to address BART crime would be to support the impoverished and desperate people who don't have any recovery or mental health support, but that work is slow...


Of all news outlets you'd never expect, USA Today did a good amount of FOIA requests and made them searchable at https://www.usatoday.com/in-depth/news/investigations/2019/0...

There are other sources regarding Brady lists like https://giglio-bradylist.com/ and http://bradycops.org/, but they are obviously not 100% complete.


For folks who do this kind of disparate data-source scraping at scale, what does best practices look like? What kind of tools are used in industry?

Maintaining scrapers for 18k county websites and PDs is no small task and looking through the docs for PDAP, it seems like this is still a very open question.


Our World In Data is the largest open source data collection & analysis that I'm aware of. https://github.com/owid

The 80000 Hours podcast has an interview with the (non-technical) creator of OWID. I seem to recall some interesting stories about them getting emailed PDFs with COVID data and such.

I had the same question as you, and I was hoping to find ideas in the comments. It seems like the kind of thing that's both inherently messy and scrappy yet if you don't get at least somewhat organized it can't scale.

Update: link to the podcast episode page with quotes, transcripts, etc. https://80000hours.org/podcast/episodes/max-roser-our-world-...


Thanks for sharing!

It's interesting that even one of the largest still uses manual execution for almost all of their pipelines (at least in the covid data project[1]). This [2] seems like the bulk of their data importers (scrapers) but most are still operating as manual jobs. I guess with open-source data work, hours and minutes don't matter as much, and being a few days behind the latest data is acceptable.

1 - https://docs.owid.io/projects/covid/en/latest/data-pipeline.... 2 - https://github.com/owid/importers


I like writing web scrapers and this is an interesting project idea. If I understand right you are looking for volunteers to write scrapers that would take a police department, scrape the PD website, and download any PDFs or documents that gather data about the police department. Is that right? If so, I feel that's not super clearly communicated - I had to look at a couple example scrapers before arriving at this guess.

I do have a few questions too:

1. Will this scale? One problem with scrapers is that they break when people update their website. I'm imagining this problem multiplied by 18,000 and compounded by each scraper potentially being written by a different volunteer.

2. Where are the scrapers getting run?

3. How do the documents that the scrapers collect get transformed into usable data?

4. It seems to me like a scalable solution would be a standard to report data, a law to compel police departments to follow that standard, and then a system to collect that data and make it available. Do you work with police departments at all on data reporting?


1. I replied to the parent comment here; our answer to the scale problem is to recognize that people doing web scraping are as decentralized as the police. Our goal is to empower people who have questions about the police to answer them.

2. You can run them locally. We're not running the scrapers anywhere, or storing extractions anywhere.

3. This is a big, big question. Right now, the answer is dependent on the use case. Rather than trying to make the world's biggest database, we're going to respond to community needs and build this kind of thing as it comes up.

4. https://measuresforjustice.org/ is doing something like this! We're interested in creating incentives for police departments to make their data more accessible and transparent.


Not to be too rude/negative/mean - but it seems like a big concern to me that a "police data" project that's existed for three years, has a couple employees, has some funding, etc - doesn't really seem to have any police data. If I wanted to write a scraper to gather documents off of police websites and run it by myself and store my results locally - I could. What does your project add?

What I would expect to see is something like:

1. Here's what data we want from each police department each day. Here's what value you should use to indicate that data is not available.

2. Here's a list of police departments. Write a scraper. If it passes tests to show it's generating valid data, and code review, we will run the scraper in a daily basis storing the output in this database.

3. Here's how you can query our database.


It's ok, this feedback is how we understand what our work looks like to other people, and how we improve!

We've only had paid staff for about 6 weeks. We've run several experiments and started from scratch a few times over the years. We've been slowly inching the project forward in our spare time; I was the only volunteer for much of that time, and I can't even code!

1. Sounds great, we're building something like this.

2. Even making a list of police departments is a big challenge. We've made a good start but have work to do.

3. Yep, soon we'll have something to query.


I was an early helper when I saw that on reddit and joined your slack before you had a discord. I was also one of the ones you mentioned that fizzled out after the initial excitement died down. But I didn't stop helping because the excitement died down. I stopped helping because I felt like we weren't "doing" anything. Other than raising money and getting paperwork in order. Have you guys actually "done" anything in the three years since? Other than, you know, collecting data and sitting around talking about "stuff"


This boils down to why most NFP's fizzle out. They're usually used by founders and participants as a launchpad for careers or companies.

Glad I saw this.


I can only give my perspective on the project: I showed up when PDAP had 2,500 members in Slack, right after Kristin made her original case study and Reddit post. There was a flurry of conversation. I empathize with the people trying to keep everyone focused in those days. It was like trying to have a 2500 person web scraping flashmob with nothing planned in advance. However, all that conversation was important. We still benefit from the combined relevant experience of those 2500 passionate people.

I took a step back from the project for a few months, not having time to volunteer. My understanding is that the board was basically formed out of all the people remaining after some enthusiasm died down.

When I came back, the board had incorporated and applied for 501c3 status. There were four board members, and a few volunteers who mostly just helped talk through the massive problem and plan. Eventually Kristin (OP) stepped down from the board, but was still at some meetings. A rotating cast of 2-3 other people would be hanging around the meetings at any given time.

I became Director of Operations on a volunteer basis for a bit over a year. This mostly just means paying bills, knowing passwords, and updating the website.

We had weekly meetings, where we'd talk for a few minutes or hours about the project, our ideas, and what we could do to move things forward. [0]

We ran a data bounty during this time [1]. One volunteer, Eric, made a bunch of prototypes around metadata for data sources.

Then we got 501c3 status after waiting for almost a year. I quit my day job and started writing grants and set up online donations. I hired two contractors for a bit of grant writing help, but otherwise did not have "coworkers" or "co-volunteers".

We got the grant money [2] about 8 months later. I went looking for a full-time software engineer. I started getting a salary and working full-time on the project as Executive Director, doing all the non-technical design, planning, and product work.

Throughout, I spent a lot of time interviewing and doing design research: investigating the work being done journalists, transparency activists, and local data users in Pittsburgh (and elsewhere). I've also been collecting feedback and experience from everyone in the Discord. Most of our current ideas about what's important and where to start come from that work, and the recent addition of an engineer with excellent journalism and software experience (about 6 weeks ago) has allowed us to start prototyping and developing something together in earnest.

Now: We're excited about our strategies, and it's probably a little early for broad consumption. We didn't coordinate this post; everything you can see is a work in progress. There's lively discussion in Discord about our goals, and I've been typing for about 24 hours straight with a break to nap and a break to eat something.

[0]: https://docs.pdap.io/updates/working-sessions [1]: https://docs.pdap.io/updates/blog/7-14-21-bounty-retro [2]: https://docs.pdap.io/updates/blog/5-17-22-first-grant-awarde...


Clarifying: when I said "otherwise did not have "coworkers" or "co-volunteers" I mean only for a period of a few months. There have almost always been people in the Discord to respond if something came up, but there were many working sessions in a row where I was the only one to show up.


I'd be interested in helping scrape, but no experience. I'd presume every county is different so there's no simple training you can put folks through? Other tasks, like monitoring for things breaking?


Thanks for wanting to help! If you go to https://docs.pdap.io you should be able to find out how to contribute Data Sources. No coding required. Holler in Discord or email me (josh.chamberlain@pdap.io) if you have trouble!


Apologies for my ignorance but how is this going to police the police? I read the original blog post, there was lots of inferences/could and might be's/etc made but little in the way of proof of anything. What's to stop the police saying it was just circumstance that provided your results?

I'm not here defending the police, or denigrating the project, just playing devils advocate. What happens if the police just ignore you?


Aren't forums like this for devil's advocating, like, almost exclusively? Working as expected!

We've come a long way since that post in terms of strategy and focus. Most of that time was spent with between 1 and 3 volunteers, working a couple hours a week.

Transparency is a good goal in itself, I think. People are already using this public data, we're just trying to make it more accessible.

"Policing the police" was the original phrase used on reddit, but if you look at our website (https://pdap.io), that's not a phrase we use.


The goal of projects like this (I have no contact with this one) is usually to convince politicians and/or the public of their results, and those groups are the ones to actually push change.


In our case it’s about working with local activists and journalists. Police are decentralized; each department acts independently, for the most part. Most people using the data seem to be using it locally; when policies are changed, they are changed locally. If the changes make a positive impact, sometimes other departments or governing bodies take notice.

I’d like to aim higher than a quotable statistic for a politician, in any case :)


Assuming the data is accurate it can be used to show disparities between groups for a variety of situations - traffic stops, arrests, jail vs. diversion programs, charge stacking, etc..


How is that policing the police? Are disparities supposed to be evidence of something nefarious going on? Given that there are fundamental distinctions between members in different groups (otherwise they would be in the same group) and almost certainly many other non-fundamental distinctions that correlate with the group-defining distinction, is it not entirely plausible that there should be disparities in police statistics even when police act appropriately 100% of the time?


I will say that we are not wading into this. We focus on accessibility; if you point to a big pile of ugly data, the first thing that will happen is that a bunch of very smart people will analyze it. We’re trying to make the big pile, which is currently in like half a million small piles.


>is it not entirely plausible that there should be disparities in police statistics even when police act appropriately 100% of the time?

The National Crime Victimization Survey says yes. Also any article you see trying to debunk FBI crime stats but doesn't mention the NCVS (and how the NCVS largely corroborates the FBI stats) is either ignorant or willfully deceiving you.


Depends on what the data shows. For instance, nobody ever wants to talk about why 95% of those killed in police shootings are males. It's far more disproportionate than any of the race based numbers that make headlines daily, and yet..nothing.

This seems to indicate the data will always and only be used to tell a preferred narrative.


Downvotes aside I think I'm more worried about this. The preferred native approach, that's the risk that undermines things like this.

Despite the downvotes your example is good I think it's safe to say the 95% male to female ratio is likely to be down to males more likely to be involved in violent incidents than females. No one really has a problem with this until skin colour comes into it. As a society though tackling the cause of why males get into violent confrontations seems like a no brainer.


...because there's a plausible and uncontested rationale, unlike skin color? Men generally have more testosterone, which leads to aggression and worse impulse control. Nobody's talking about the reason men are 95% of police shootings because it's pretty obvious.


> ...because there's a plausible and uncontested rationale, unlike skin color? Men generally have more testosterone, which leads to aggression and worse impulse control.

Why isn't it plausible that different groups are different in some ways? For instance, when it comes to testosterone, this seems to be the science.

https://www.ncbi.nlm.nih.gov/books/NBK20759/

> In fact, African American men have higher exposure to testosterone, the main biologically potent circulating androgen, than their Caucasian and Asian counterparts, beginning in the in utero period. African American women have testosterone levels that exceed those of Caucasian women by 50% or more in early pregnancy, an exposure that has been hypothesized to permanently alter the “gonadostat,” the hypothalamic-pituitary-testicular axis, in African American male offspring relative to Caucasians. African American men during young adulthood also have substantially higher circulating testosterone levels than their Caucasian counterparts (approximately 13 to 15% difference at age 20 years). Although this difference appears to dissipate with age, African American men still have slightly higher testosterone levels than Caucasians (≈3% higher) at age 40 years.


Because when it comes to race, unlike sex, it's contested.

For example, here's a larger study showing NO difference in testosterone between black and white men: https://pubmed.ncbi.nlm.nih.gov/17456570/

Conversely, I've never heard of a single study anywhere claiming that women have more testosterone than men. QED


You're right that men are far more likely to be involved in violent incidents which are likely to lead to potentially violent confrontations with police. You're also right that this difference largely explains the disparity in the percentage of people shot by police by gender.

I guess the question is why people don't think that correlation holds true by race (or culture) as well. The percentages match up. For instance, white males (and black males and hispanic males) are actually over-represented as demographic cohorts who are victims of police shootings, whereas asian males and women of all races are under-represented. This tracks exactly with violent crime rates.

Perhaps we should address this a gender problem more than a racial problem. At least, that's what the data tells us.


Is it possible to see the data the PDAP has scraped? I visited the website but I don't see any actual data.


We don't have any scraped data yet. I replied to the parent post addressing some of this, but mostly if people need the data they run a scraper locally and use the data that way. At the moment, our energy is going into building an app to help people submit and manage our database of data sources: https://docs.pdap.io/activities/data-sources/what-is-a-data-...


Ah very good! This sounds like it would be a perfect application of https://datasette.io/


We’ve looked at this and it’s pretty exciting! Have you used it? Know of any particularly relevant case studies we should look at?



It still isn't very clear what the focus and direction they want to go in is from that page.

What would be an example of a core data set they are trying to compile? police involved shootings? police budgets? everything.


Seems pretty obvious from the page:

> Our mission is to make data from every U.S. police agency accessible via a single public resource.

More precisely:

> There are over 18,000 police organizations, and each has a unique way to publish information.

i.e. Police are publishing data? Let's organize it and make it easily accessible.


I am curious and want to know more.

"Police data" is incredibly vague. What types of data are most interest and available from most of every agency?

If the only answer to what kind of data is "police data" then I'm not sure if I should care, support, or contribute.

Is this data on how many toilet paper roles departments purchase or police involved shootings? neither?

Certainly you can see where the question is coming from right?


Your question makes sense. I’ve addressed this elsewhere but I’m currently just answering on my phone so I’ll summarize by saying that our focus is on complete, solid local data one municipality at a time (full context is really helpful when actually using the data) as opposed to any particular type of data.

That said, we still have work to do pointing people in this direction and helping them understand why. This whole thread is going to really affect our website and docs :)


Thanks for responding! What is complete data? Arrests by time and geolocation? Quarterly stats by call or response?

I have gone through some of the public info for my department so I am curious.


Again, it seems to me that they if police are publishing data, they want to make it accessible.

Are police publishing arrests by time and geolocation? Quarterly stats by call or response? Then yes.

Hell, I'm sure if they are publishing coffee consumption numbers, that would go up as well.


>Are police publishing arrests by time and geolocation? Quarterly stats by call or response? Then yes.

I obviously don't know, and neither do you.

I don't know why you feel the need to weigh in on my question with unhelpful non-answers.


I wonder if it is legal / possible to record police radio traffic and associate it with the records?


As a long time scanner enthusiast, if you actually spend anytime listening to PD radios (which is legal and easy), you will be disappointed with how little information actually goes out over the unencrypted air - just enough to get units rolling, after that, very little, for obvious reasons.


Because cops are lazy and want to put it all in a report later? I would.


A ton of stuff is done on mobile terminals. Computer aided dispatch including status and location of a unit, report writing, identity checks, etc. Scanners were useful because a cop would ask the dispatcher for a license check on Joe Blow, age 34, of Smithville - or registration check on a certain license plate number. These days by the time an officer flips on his lights to pull someone over, they probably already know the registration/insurance status, license status of the registered owner, their photo, etc.

Then there's email and other messaging for lower priority things, and phone calls for stuff that is sensitive.


This is a problem when the computer is wrong. I've been pulled over for having "unregistered" plates that were perfectly valid.


Depending on the country it can be varying levels of "illegal", in Australia, Police use encrypted radio devices, P{01-99} <- some number tagged with P, I can't remember. Whilst police use encrypted comms, other forms of emergency response (the primary one being Ambulance service) use POCSAG, which is entirely unencrypted and a paging protocol, it's also used in hospitals.

Listening is not illegal, but recording or redistributing "is illegal", however, it's not clear whether it's actually illegal or it just depends how and when you use it and what you do with it. There was a kid here that brought it up with the police and was harassed over it, I believe he made a website to broadcast it over a web page and was given a stern telling off. Which tbh is fairly valid as it has horrific amounts of PII in it.


We do want to be careful with PII. We don’t want to make a big table of PII, although tables of PII are regularly published to government websites as html tables. Yet another reason to keep data decentralized while pointing people to it from a central db of sources!


Alot of comms has moved to being encrypted. Before that yes it was considered public


You can still get the 'metadata'. ;)


Public airwaves FTW.


I'm curious if there are opportunities to be a force multiplier here. I see that the Readme says "there's no automated scraper farm" yet. Getting that set up seems crucial. Will jump on the Discord :)


The community has been writing scrapers since the beginning, but it's a huge task. We've also written scraper templates, since many counties use the same systems. https://github.com/Police-Data-Accessibility-Project/PDAP-Sc...


Yep, I see that there are a lot of existing scrapers. Getting those up and running automatically on a backend host will make things a lot smoother.

Also, I recommend using Playwright instead of Selenium. I'm in the Discord now and will be hanging out, so looking forward to chatting more about this and contributing.


Thanks for taking a look! DM me on Discord if you'd like to chat about this.


Are you also working on pushing standards for data sources, such as a state-level standard? Ideally federal standards?

Maintaining thousands of scrapers for different formats seems like a nightmare, and it won't take long for departments to learn they can slightly tweak the format of their reporting to cause extra work for you.

On the plus side, working with all this data probably makes you all very qualified to advise on developing standards.


We definitely want scrapers to be maintained as-needed, as opposed to trying to spin all the plates. It’d be thousands of thousands.

Measures for Justice is working on developing standards: https://measuresforjustice.org/


Thanks for sharing about your project!

Do you mind giving us brief on what kind of data you are collecting and highlight any interesting findings so far?


On the back end, are you using a graph? Having done some public sector accountability stuff where the org structures themselves were obfuscated, graphs and a clear data model were the decisive tech.


Can you say more about this? Feel free to reach out to my email (josh.chamberlain@pdap.io) if you'd like to share more. It sounds like you have some expertise that would be incredibly useful to us.


Thanks! Work I did was with Neo4j, which a reasonably technical person can learn in a few days using their movie-db tutorials, where once you have a clear idea of what the categories are in your data and what is a top level "thing" vs. what an attribute of a thing is, you can get correlations. (persons, orgs, parties, IP addrs, and maybe events are the things, where addresses, dates, operating systems, contact info, are attributes. Relationships are things like "comprises", "owns", "is-located", "employs", "pays/funds", "is-member-of" etc. you need to start thinking in relationships) What annoys me about other graphdb's is they want to load your head with a complete graph theoretic framework before you can be useful, where cypher/neo works more like a graph based spreadsheet you can use to think in.

I have used it in different levels of govt to map managers to financial line items, to applications, corporate entities, projects, contract counterparties, platforms, techs, machines, ip addrs, vulnerabilities, etc. Developing a clear and addressable ontology of huge organizations with tens of thousands of people and devices is probably my one of my more useful skills. The main use case for graphs to me is patchy data, where you have a pile of incomplete metadata in dispirate spreadsheets and you need to find coherent paths through all of it.

I won't be in touch because I know what those people are capable of, but if graphs haven't accelerated your work already, you have some really epic times ahead!


I appreciate you taking the time to explain this to me. As a non-engineer I think I’m following you.

When you say “I know what those people are capable of” do you mean something as ominous as that sounds?


Is there anything like this for regulatory capture in federal and state governments?

I could imagine a revolving door between people working in the regulatory bodies and the industry they regulate.


I’m not aware of one. Tracking people is hard because names aren’t unique; it’s a challenge tracking officers as they move between departments, too.


Very interesting. I have written scrapers for the jail inmate data in the couple of counties nearest me - does that come under the scope of what you're doing, or not quite?


I also have inmate scrapers running (and republishing the data on a better website). Mine apply to any county using a specific software.

They're not cleaned up and ready for release, but I could do that if it's useful to the project.

At least a few more counties in my state use the same software.


Interesting, what are you using them for?

Most of our scrapers are written by inexperienced volunteers so this is probably still quite valuable as is. If it works, we’ll take it.

You could also raise a hand as a human data source; perhaps one day someone will be looking for jail data from these counties and you could offer up some way to contact you. I gave my contact info to the person you replied to. Use it if you want!


The software that my county uses is very poor for just seeing who got arrested. The only view is a "live roster" at the county jail, where they process all arrests. Some people are released very quickly.

I have it republishing on a very basic site with an easier way to view the arrests.

The software I'm scraping is called "Justice Solutions" (findtheinmate.com) and it looks like it was built without any knowledge of design OR usability.


Yes! Police, jails, and courts. Those are all under the umbrella of criminal justice data, and all contextualize each other.

Could you share what you’ve made somehow? josh.chamberlain@pdap.io or discord are likely the best way.

Would also be curious to hear what made you take this on.


Sounds like one of the data bounties from DoltHub.com. Just thought I should drop this link. I am not affiliated with them.


I would like to research on the data, is it available as a source? (Email in profile)


[flagged]


Yikes. Other users have pointed out a bunch of things already but I need to add that this sort of name-calling and personal attack* is against the rules and spirit of this site, and is the sort of thing we ban accounts for. Regardless of how right you are or feel you are, please don't do it again.

If you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules when posting to HN, we'd appreciate it.

* especially against a new user—greeting newcomers with a torrent of abuse is really bad


PDAP is not crypto based, and there are no plans to make it so. The message you pasted here was about a meeting taken last year with someone who was proposing this. It isn't something the community or leadership was interested in. PDAP is a non-profit, not a crypto startup.


The post you quoted lists potential avenues for using web3 tools. You may notice we have not gone down any of those avenues. We aren't doing crypto stuff.

We did indeed spend some time looking into web3 and peer-to-peer, and ultimately decided it's not for us. Most of the energy there is spent trying to make money / scam people. Peer-to-peer is cool too but has its own risks.

Thanks for checking out the discord, though!


Yeah, I'm glad you went this route.

It's one thing if you want to have a crypto wallet to accept donations -- that's not that controversial. It's another to drink the web3 kool-aid and base your organization's future on that.


I don't understand the claim you're making in this response. Can you condense it to 1-2 sentences (and cite source w/ link if available)?


could you elaborate exactly where the filth is here? I think I get it, but I'm not super-well versed enough to understand what the motive would be here.


It certainly set off my alarm bells for people who try to do pointless blockchain stuff for personal profit (or I guess fun and street-cred/CV-lines).

That doesn't mean it has to be the case here. But at first glance a DOA seems more like a detriment here (police can outspend citizens) and NFTs are NFTs, no explanation needed I suspect. Suggestion 3 might have merit, but storing the data in the blockchain (instead of just some hashes for timestamping) makes it look like some overambitious vanity project again.


Edit: Seems like this criticism doesn't apply to this project. I think that's good. What I wrote below is just an explanation of why someone might view crypto as a red flag in a charity project.

---

I think the basic idea is that this web3 crypto stuff is pretty scammy. It would be like finding out the not-for-profit you were thinking about working with also sells timeshares - maybe it is, against all odds, legit, but still not a good look and kind of a red flag.

Another take on it is - if I want to volunteer at a place I don't expect to get paid. If I'm working for payment, I don't want my payment to be in NFTs. So, the web3 intrusion into this idea is unnecessary and doesn't fit for either volunteers or employees.


Nothing about PDAP is web3, or crypto based, though some community members have suggested it (especially last year among all the Web3 hype.) Turning into a web3 company is not happening.


That's one of the many reasons why we abandoned the crypto idea after some initial optimism and research.


> pathetic crypto grift elsewhere please.

On your keybase.io page you list a BTC donation address.


Oh wow, that profile I created 10 years ago and have never once revisited has a BTC wallet address. You sure got me.


Do you really lack the self-awareness to comprehend how massively hypocritical your series of comments is? And you're double downing on it? It's truly remarkable (in the most negative way possible) what political tribalism does to people...


10 years ago, I kicked the tires of a (beta, at the time) identity proving service built by a colleague and added all the fields including a crypto wallet. The assumption that it is for personal “donations” is conjecture. I don’t need your money or acceptance. Thanks


Also, maybe you should read this https://news.ycombinator.com/newsguidelines.html before trying to dox “hypocritical” witches.


Maybe you should read this, I was not talking about your keybase account :) https://news.ycombinator.com/item?id=32903946


While I appreciate web3 brainrot backpedaling, I don’t see how this is related to explicit rule violation of this site of personal attacks.


>Take your pathetic crypto grift elsewhere please.

Triple down on the hypocrisy, now that's impressive!


Floating the idea of having a DAO governance system =/= "crypto grift." It's a legitimate way of trying to ensure that control of a distributed system remains distributed. It doesn't even sound like they've pursued the idea. You can, in theory, have a non-profit DAO.

Recently there has been an unreasonable amount of hostility towards anything that even mentions crypto or any related technology. I wonder what the source of it is.


a vast number of people converting to marxism. (not me)


[flagged]


Could you please avoid flamewar comments on HN and, also, please make sure you aren't using this site primarily for political or ideological battle? That's one line at which we ban accounts (https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme...). Regardless of what they're battling for or against, it's not what this site is for, and it destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.


Sure. This comment was pretty snark, that's fair.

Though I come from an uncommon political background and that colors how I interface with and view technology, I do generally feel I've approached the site from a place of genuine, calm, and engaged discourse on the whole.


I do see some evidence of that (good!), but your comments so far also pattern-match to a class of accounts we frequently end up having to ban, because they use HN more for political battle than for curiosity. The pattern-match may be wrong in your case, but given the GP comment, I thought I should try to nudge you in the intended direction of the site.

---

Your phrase 'uncommon background' made me want to write down some more general thoughts for a bit, but just ignore it if it's not of interest.

We don't have any problem with uncommon backgrounds—we welcome them. Conversation gets better when it happens across differences—so long as people can remain curious. The trouble is that curiosity comes under strain as backgrounds diverge, differences increase, and people have less in common. The risk of the connection 'snapping' and the thread degenerating gets higher. This risk is greater online than it is in person, where there are more channels of information to draw on and also more constraints on how we treat each other.

When things 'snap' and then degenerate, we have no choice but to intervene as moderators, not to take a side on the topic, but literally to moderate the kinetic energy that breaks out. The alternative would be to let it destroy the forum, and that wouldn't do any good for anyone.

The person with an uncommon background—the one who holds a deviant or contrarian view, relative to the majority—inevitably comes under additional pressure when expressing themselves. Their risk of being misunderstood is higher, the likelihood of someone showing up to support them is lower, and there's a good chance that they'll attract a flurry of shallow majoritarian responses. This doesn't happen because people have bad intentions—it happens because of statistical mechanics. But it feels like the others have bad intentions; we're not designed to feel statistical mechanics.

When that happens, it's hard not to snap. The person with the minority view, being under additional pressure, often lashes out at the rest in a way that is against the rules of the site and that we have no choice but to moderate. They get labeled as the 'bad' one, but that's not really fair—the snappage is as much a consequence of the pressure differential as of any personal lapse. Most people would 'lapse' in such a situation. It's really a shared problem, but the majority gets to feel angelic while the other holds the bag.

I see this a ton on HN across every sort of 'minority' you can imagine—the obvious demographic minorities, of course, but also a long tail of less obvious subgroups. It's like a massively parallel greatest hits album of social psychology experiments.

Dismayingly often, we end up having to ban the account that lashes out for repeatedly breaking the site guidelines, even while sympathizing with their situation because of the dynamics I've just described. Then often then lash out at the mods for siding against them, accuse us of bias, and so on. In reality we may well personally agree with them, and even if not, we sympathize with their position—but it's not possible to communicate that.

Some of this, of course, is what minorities have always known—they're held to a higher standard in an unfair way. But it's interesting that one can derive this from the mechanical conditions of an internet forum.

The open question is whether there's a way out of the unfortunate tradeoff here, which is that moderating to keep kinetic energy at tolerable levels—that is, moderating flamewars so the forum doesn't burn to a crisp—means favoring the mediocre majority with its predictable views. HN is a good place to look for a way out, because both poles of the tradeoff—flamewar and lameness—are bad for curiosity, and curiosity is the one thing we're trying to optimize for (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...).


[flagged]


I think if a project like ours ever did use blockchain, it would be behind the scenes as a part of the product; decentralization and transparency are key parts of our ethos! However, it's too unstable and as you can see by the comments, mentioning "web3" costs a lot of credibility in many, or maybe most, communities online.

https://knish.io/ is a good example of people using blockchain to make real products. I don't know that it's ready though.


[flagged]


You can find the roadmap, goals, and a lot of detail on the organizational structure and the scrapers that have been written so far are here https://docs.pdap.io/activities/our-process and here https://github.com/Police-Data-Accessibility-Project


With all due respect, you really need to be able to explain the goals in a sentence or two, and then follow up with a link to the pdf. An elevator pitch, a vision statement, a clearly written user statement, all would go a long ways towards explaining why this is a worth while use of ones time and energy.

I don't know whether the goal is to find bad cops who have been fired and get hired one town over and try to prevent those hirings? Or if the idea is to find people who have been arrested and ruin their lives? Or if the goal is simply to make the data available and let people use it for whatever they wish, from deepfakes to erotic fan fiction.

Data wants to be free. The data is the end goal. Public data should be public, etc all sound great. But, I've seen the mug shot database ruin people's lives, and eventually its founders had mugshots of their own.

Scraping and publishing is one thing, but knowing the goals is even more important because it would let me know why we're scraping, what we need to scrape, how to make that available, etc.

What problem are you trying to solve? And Why?

Police data is not easily searchable? Okay, so what? What good is it to make this available? What uses could it have? Even if the goal is just to build it and see how people use it, it would be helpful to know that.

I strongly suggest you consider working backwards. It's been 3 years. What does 3 years from now look like? What is the press release? How do you know you were successful, what does that mean?

Availability is nice. It could lead to transparency. That may lead to accountability. Is that the goal? Right now this data is hard to access. If you succeed it will be easier to access. And so what? What changes in the world could come because of this? If you fail, what lost opportunity do we mourn? You gotta have some sort of easily conveyed reason for doing this. If the data accessibility alone is the end goal, that's fine, not enough for me, but at least make that clear and convince me that's worth the effort.


Thanks for all this. Articulating why it’s important in a few words is incredibly difficult and something we’re always working on. The bottom line is that we believe public information should be easily accessible; hundreds of thousands of people are using this data currently, all over the country. We’re trying to make their lives easier. I’ll take your feedback and use it to make our stuff better.

That said, if you head to https://pdap.io you’ll find the most concrete explanation we have. It still needs to be clearer, but it’s more specific than the docs.


Sounds good. Thank you for taking this the right way - I found it hard to articulate the criticism in a constructive way with few words, so it was verbose and I hoped for the best.

Amazon, for all its faults, has this process well defined. These links may be helpful to you. Good Luck

https://coda.io/@colin-bryar/working-backwards-how-write-an-...

https://medium.com/intrico-io/strategy-tool-amazons-pr-faq-7...

A nice video: https://www.youtube.com/watch?v=aFdpBqmDpzM


> trying to gain followers on LinkedIn instead of doing impactful work.

wtf are you smoking? Do you think that activism doesn't involve marketing, and do you think that people don't need to market themselves in order to find volunteers? Do you call out charities for advertising, or call out employment ads for sounding like they're desperate for employees?

> You say it's turned into something real but I'm struggling to find it in the sea of marketing.

Did you just ignore the URLs?


On their github [0] I found https://app.pdap.io but it is not loading for me.

[0] - https://github.com/Police-Data-Accessibility-Project/PDAP-Da...


This was an early experiment, we don't have a published app right now.


Thanks for taking the time to be obnoxious to one of the tiny percentage of people who's willing to make a personal sacrifice towards important real world problems for a reason other than making money.

Do everyone a favor and go back to wherever you came from and stay there quietly.


The clickbait-y headline made me not want to click on the article, so pointing out the wording isn't a bad idea. I almost didn't read the article due to the title.


Is their any data sources we could scrape to stop crimes in our neighborhoods so the police don't have any reason to come around and cause problems?


https://raheem.ai/ is an interesting project. One idea someone in one of their community calls had was scraping dispatch data to figure out where social services might be useful, and they're creating a dispatch app which lets people ask for non-police help.


You could scrape improved social safety nets, better access to birth control, and taxing the rich to pay for it.


How could we use that data to help prevent crime?

To best mimic the success of the original post, it would be good to be able to identify people doing the most damage to a neighborhood so we could help them first with tax relief, food, birth control, whatever would help the most.


> it would be good to be able to identify people doing the most damage to a neighborhood

That would be the police.


> That would be the police.

Yes, but the fact that we have to monitor them proves they aren't up to the job. The people need to do this, and we should use scraping data to prevent _ALL_ crimes.


Not the parent poster but i think they were saying that the police are the ones doing the most damage in the community.


Is there data to show most crime in America is committed by Police? I find that hard to believe.


Nobody said crime. They said damage to communities.


"Damage caused by police that are not crimes" is so vague as to need clarification, as I can't imagine what that could even be...

My point about stopping ALL crime with data is a very clear statement that has validity and merit.


Not understanding or having experience with something is not the same as it lacking validity and merit.

Without understanding the problems and ways police harm communities, throwing data at a problem is unlikely to result in good outcomes. Through the lens of “crime” harm is hard to separate from illegality. Driving with expired plates because you can’t afford it until next month is illegal, but it isn't causing the same amount of harm as burying that person under court fees and fines until they’re homeless. Unless you collect the right data, understand the data's meaning, data is useless.

Some starting places on harm police do to communities, if you’re interested:

- podcast: season 3 of Serial https://serialpodcast.org/season-three

- article: https://www.huffpost.com/entry/over-policing-of-america_b_44...

- book: the end of policing https://en.m.wikipedia.org/wiki/The_End_of_Policing


I am not sure why you are arguing against stopping crime. Maybe you misunderstood my posts.


I’m not against stopping crime, but most proposed solutions to crime usually boil down to “more cops”. Even data-driven ideas like you’re suggesting seem to end up as convoluted ways of saying “send more cops here instead of there”. Crimes of poverty can’t be solved with policing, and any attempt to reduce crime without first acknowledging that is very likely to do more harm than good.


It would be interesting to do this too. Although this is going to be a yak shave (a good one) - changing the upstream things that cause people to turn to crime.


Also relevant:

So far this year, 177 LEO officers have died in the line of duty. Our gratitude should go to all.

https://www.odmp.org/search/year/2022


Policing is not even in the "Top 10" most dangerous jobs in the country. Most of the people on your list died of COVID or car accidents.


I wonder how many delivery drivers have died in the line of duty this year?

Uber reports 59 crash-related driver deaths in 2018, and that's just one gig economy company.

Also note US Postal workers suffered 5800 dog bits in 2020. Note this does not include other delivery services.

https://www.jdsupra.com/legalnews/dangers-delivery-drivers-f...


The FBI and other police organizations are certainly interested in using police data to make the job as safe as possible, for the police as well as the people they’re sworn to protect and serve.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: